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ABSTRACT 

The  Kalman  Filter  is  traditionally  viewed  as  a  Prediction-Correction  Filtering 
Algorithm.  In  this  report  we  show  that  it  can  be  viewed  as  a  Bayesian  Fusion 
algorithm  and  derive  it  using  Bayesian  arguments.  We  begin  with  an  outline  of 
Bayes  theory,  using  it  to  discuss  well-known  quantities  such  as  priors,  likelihood 
and  posteriors,  and  we  provide  the  basic  Bayesian  fusion  equation.  We  derive 
the  Kalman  Filter  from  this  equation  using  a  novel  method  to  evaluate  the 
Chapman-Kolmogorov  prediction  integral.  We  then  use  the  theory  to  fuse  data 
from  multiple  sensors.  Vying  with  this  approach  is  Dempster-Shafer  theory, 
which  deals  with  measures  of  “belief” ,  and  is  based  on  the  nonclassical  idea 
of  “mass”  as  opposed  to  probability.  Although  these  two  measures  look  very 
similar,  there  axe  some  differences.  We  point  them  out  through  outlining  the 
ideas  of  Dempster-Shafer  theory  and  presenting  the  basic  Dempster-Shafer 
fusion  equation.  Finally  we  compare  the  two  methods,  and  discuss  the  relative 
merits  and  demerits  using  an  illustrative  example. 

APPROVED  FOR  PUBLIC  RELEASE 


Aq  fof  ot-oica 


DSTO-TR-1 436 


Published  by 

DSTO  Systems  Sciences  Laboratory 
P.O.  Box  1500 
Edinburgh,  SA  5111 
Australia 

Telephone:  (08)  8259  5555 
Facsimile:  (08)  8259  6567 

©  Commonwealth  of  Australia  2003 
AR  No.  AR-012-775 
July,  2003 


APPROVED  FOR  PUBLIC  RELEASE 


ii 


DS  TO-TR-1 436 


An  Introduction  to  Bayesian  and  Dempster-Shafer  Data 

Fusion 


EXECUTIVE  SUMMARY 

Data  Fusion  is  a  relatively  new  field  with  a  number  of  incomplete  definitions.  Many  of  these 
definitions  are  incomplete  owing  to  its  wide  applicability  to  a  number  of  disparate  fields. 
We  use  data  fusion  with  the  narrow  definition  of  combining  the  data  produced  by  one  or 
more  sensors  in  a  way  that  gives  a  best  estimate  of  the  quantity  we  are  measuring.  Current 
data  fusion  ideas  are  dominated  by  two  approaches:  Bayes  theory,  and  Dempster-Shafer 
theory.  Bayes  theory  is  based  on  the  classical  ideas  of  probability,  while  Dempster-Shafer 
theory  is  a  recent  attempt  to  allow  more  interpretation  of  what  uncertainty  is  all  about. 

In  this  report  we  will  discuss  the  above  two  philosophies  or  paradigms  that  make  up 
a  large  amount  of  analysis  in  the  subject  as  it  currently  stands,  as  well  as  giving  a  brief 
and  select  review  of  the  literature.  The  oldest  paradigm,  and  the  one  with  the  strongest 
foundation,  is  Bayes  theory,  which  deals  with  probabilities  of  events  occurring,  with  all 
of  the  usual  machinery  of  statistics  at  its  disposal.  We  show  that  the  Kalman  Filter  can 
be  viewed  as  a  Bayesian  data  fusion  algorithm  where  the  fusion  is  performed  over  time. 
One  of  the  crucial  steps  in  such  a  formulation  is  the  solution  of  the  Chapman-Kolmogorov 
prediction  integral.  We  present  a  novel  method  to  evaluate  this  prediction  integral  and 
incorporate  it  into  the  Bayesian  fusion  equations.  We  then  put  it  to  use  to  derive  the 
Kalman  filter  in  a  straightforward  and  novel  way.  We  next  apply  the  theory  in  an  example 
of  fusing  data  from  multiple  sensors.  Again,  the  analysis  is  very  straightforward  and  shows 
the  power  of  the  Bayesian  approach. 

Vying  with  the  Bayes  theory  is  the  Dempster-Shafer  theory,  that  deals  with  measures 
of  “belief’  as  opposed  to  probability.  While  probability  theory  takes  it  as  given  that 
something  either  is  or  isn’t  true,  Dempster-Shafer  theory  allows  for  more  nebulous  states 
of  a  system  (or  really,  our  knowledge),  such  as  “unknown”.  We  outline  the  ideas  of 
the  Dempster-Shafer  theory,  with  an  example  given  of  fusion  using  the  cornerstone  of  the 
theory  known  as  Dempster’s  rule.  Dempster-Shafer  theory  is  based  on  the  nonclassical  idea 
of  “mass”  as  opposed  to  the  well-understood  probabilities  of  Bayes  theory;  and  although 
the  two  measures  look  very  similar,  there  are  some  differences  that  we  point  out.  We 
then  apply  Dempster-Shafer  theory  to  a  fusion  example,  and  point  out  the  new  ideas  of 
“support”  and  “plausibility”  that  this  theory  introduces. 
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1  Introduction 


Data  Fusion  is  a  relatively  new  field  with  a  number  of  incomplete  definitions.  Many  of 
these  definitions  are  incomplete  owing  to  its  wide  applicability  to  a  number  of  disparate 
fields.  We  use  data  fusion  with  the  narrow  definition  of  combining  the  data  produced  by 
one  or  more  sensors  in  a  way  that  gives  a  best  estimate  of  the  quantity  we  are  measuring. 
Although  some  of  the  theory  of  just  how  to  do  this  is  quite  old  and  well  established,  in 
practice,  many  applications  require  a  lot  of  processing  power  and  speed:  performance  that 
only  now  is  becoming  available  in  this  current  age  of  faster  computers  with  streamlined 
numerical  algorithms.  So  fusion  has  effectively  become  a  relatively  new  field. 

In  this  report  we  will  discuss  two  of  the  main  philosophies  or  paradigms  that  make 
up  a  large  amount  of  analysis  in  the  subject  as  it  currently  stands,  as  well  as  give  a  brief 
and  select  review  of  the  literature.  The  oldest  paradigm,  and  the  one  with  the  strongest 
foundation,  is  Bayes  theory,  which  deals  with  probabilities  of  events  occurring,  with  all  of 
the  usual  machinery  of  statistics  at  its  disposal.  Vying  with  this  is  Dempster-Shafer  theory, 
that  deals  with  measures  of  “belief”  as  opposed  to  probability.  While  probability  theory 
takes  it  as  given  that  something  either  is  or  isn’t  true,  Dempster-Shafer  theory  allows 
for  more  nebulous  states  of  a  system  (or  really,  our  knowledge),  such  as  “unknown”.  A 
further  paradigm — not  discussed  here — is  fuzzy  logic,  which  in  spite  of  all  of  the  early 
interest  shown  in  it,  is  not  heavily  represented  in  the  current  literature. 


2  A  Review  of  Data  Fusion  Literature 

In  this  section  we  describe  some  of  the  ways  in  which  data  fusion  is  currently  being  ap¬ 
plied  in  several  fields.  Because  fusion  ideas  are  currently  heavily  dependent  on  the  precise 
application  for  their  implementation,  the  subject  has  yet  to  settle  into  an  equilibrium  of 
accepted  terminology  and  standard  techniques.  Unfortunately,  the  many  disparate  fields 
in  which  fusion  is  used  ensure  that  such  standardisation  might  not  be  easily  achieved  in 
the  near  future. 


2.1  Trends  in  Data  Fusion 

To  present  an  idea  of  the  diversity  of  recent  applications,  we  focus  on  recent  Inter¬ 
national  Conferences  on  Information  Fusion,  by  way  of  a  choice  of  papers  that  aims  to 
reflect  the  diversity  of  the  fields  discussed  at  these  conferences.  Our  attention  is  mostly 
confined  to  the  conferences  Fusion  ’98  and  ’99.  The  field  has  been  developing  rapidly, 
so  that  older  papers  are  not  considered  purely  for  reasons  of  space.  On  the  other  hand 
the  latest  conference,  Fusion  2000,  contains  many  papers  with  less  descriptive  names  than 
those  of  previous  years,  that  impart  little  information  on  what  they  are  about.  Whether 
this  indicates  a  trend  toward  the  abstract  in  the  field  remains  to  be  seen. 


Most  papers  are  concerned  with  military  target  tracking  and  recognition.  In  1998 
there  was  a  large  number  devoted  to  the  theory  of  information  fusion:  its  algorithms  and 
mathematical  methods.  Other  papers  were  biased  toward  neural  networks  and  fuzzy  logic. 
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Less  widely  represented  were  the  fields  of  finance  and  medicine,  air  surveillance  and  image 
processing. 

The  cross  section  changed  somewhat  in  1999.  Although  target  tracking  papers  were 
as  plentiful  as  ever,  medical  applications  were  on  the  increase.  Biological  and  linguistic 
models  were  growing,  and  papers  concerned  with  hardware  for  fusion  were  appearing.  Also 
appearing  were  applications  of  fusion  to  more  of  the  everyday  type  of  scenario:  examples 
are  traffic  analysis,  earthquake  prediction  and  machining  methods.  Fuzzy  logic  was  a 
commonly  used  approach,  followed  by  discussions  of  Bayesian  principles.  Dempster-Shafer 
theory  seems  not  to  have  been  favoured  very  much  at  all. 


2.2  Basic  Data  Fusion  Philosophy 

In  1986  the  Joint  Directors  of  Laboratories  Data  Fusion  Working  Group  was  created, 
which  subsequently  developed  the  Data  Fusion  Process  Model  [1],  This  is  a  plan  of  the 
proposed  layout  of  a  generic  data  fusion  system,  and  is  designed  to  establish  a  common 
language  and  model  within  which  data  fusion  techniques  can  be  implemented. 

The  model  defines  relationships  between  the  sources  of  data  and  the  types  of  processing 
that  might  be  carried  out  to  extract  the  maximum  possible  information  from  it.  In  between 
the  source  data  and  the  human,  who  makes  decisions  based  on  the  fused  output,  there  are 
various  levels  of  processing: 

Source  preprocessing  This  creates  preliminary  information  from  the  data  that  serves 
to  interface  it  better  with  other  levels  of  processing. 

Object  refinement  The  first  main  level  of  processing  refines  the  identification  of  indi¬ 
vidual  objects. 

Situation  refinement  Once  individual  objects  are  identified,  their  relationships  to  each 
other  need  to  be  ascertained. 

Threat  refinement  The  third  level  of  processing  tries  to  infer  details  about  the  future 
of  the  system. 

Process  refinement  The  fourth  level  is  not  so  much  concerned  with  the  data,  but  rather 
with  what  the  other  levels  are  doing,  and  whether  it  is  or  can  be  optimised. 

Data  management  The  housekeeping  involved  with  data  storage  is  a  basic  but  crucial 
task,  especially  if  we  are  dealing  with  large  amounts  of  data  or  complex  calculations. 

Hall  and  Garga  [1]  discuss  this  model  and  present  a  critique  of  current  problems  in 
data  fusion.  Their  points  in  summary  are: 

•  Many  fused  poor  quality  sensors  do  not  make  up  for  a  few  good  ones. 

•  Errors  in  initial  processing  are  very  hard  to  correct  down  the  line. 

•  It  is  often  detrimental  to  use  well-worn  presumptions  of  the  system:  for  example 
that  its  noise  is  Gaussian. 
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•  Much  more  data  must  be  used  for  training  a  learning  algorithm  than  we  might  at 
first  suppose.  They  quote  [2]  as  saying  that  if  there  are  m  features  and  n  classes  to 
be  identified,  then  the  number  of  training  cases  required  will  be  at  least  of  the  order 
of  between  10  and  30  times  mn. 

•  Hall  and  Garga  also  believe  that  quantifying  the  value  of  a  data  fusion  system  is 
inherently  difficult,  and  that  no  magic  recipe  exists. 

•  Fusion  of  incoming  data  is  very  much  an  ongoing  process,  not  a  static  one. 

Zou  et  al.  [3]  have  used  Dempster-Shafer  theory  in  the  study  of  reducing  the  range 
errors  that  mobile  robots  produce  when  they  use  ultrasound  to  investigate  a  specular 
environment.  Such  an  environment  is  characterised  by  having  many  shiny  surfaces,  and 
as  a  result,  there  is  a  chance  that  a  signal  sent  out — if  it  encounters  several  of  these 
surfaces — will  bounce  repeatedly;  so  that  if  and  when  it  does  return  to  the  robot,  it  will 
be  interpreted  as  having  come  from  very  far  away.  The  robot  thus  builds  a  very  distorted 
picture  of  its  environment. 

What  a  Bayesian  robot  does  is  build  a  grid  of  its  surroundings,  and  assign  to  each 
point  a  value  of  “occupied”  (by  e.g.  a  wall)  or  “empty” .  These  are  mutually  exclusive,  so 
p(occupied)+p(empty)  =  1.  The  Dempster-Shafer  approach  introduces  a  third  alternative: 
“unknown”,  along  with  the  idea  of  a  “mass”,  or  measure  of  confidence  in  each  of  the 
alternatives.  Dempster-Shafer  theory  then  provides  a  rule  for  calculating  the  confidence 
measures  of  these  three  states  of  knowledge,  based  on  data  from  two  categories:  new 
evidence  and  old  evidence. 

The  essence  of  Zou’s  work  lies  in  building  good  estimates  of  just  what  the  sensor 
measures  should  be.  That  is  the  main  task,  since  the  authors  show  that  the  results  of 
applying  Dempster-Shafer  theory  depend  heavily  on  the  choice  of  parameters  that  deter¬ 
mine  these  measures.  Thus  for  various  choices  of  parameters,  the  plan  built  by  the  robot 
varies  from  quite  complete  but  with  additional  points  scattered  both  inside  and  outside 
of  it  (i.e.  probabilities  of  detection  and  false  alarm  both  high),  to  fairly  incomplete,  but 
without  the  extraneous  extra  points  (corresponding  to  probabilities  of  detection  and  false 
alarm  both  low). 

The  final  conclusion  reached  by  Zou  et  al.  is  that  the  parameter  choice  for  quantifying 
the  sensor  measure  is  crucial  enough  to  warrant  more  work  being  done  on  defining  just 
what  these  parameters  should  be  in  a  new  environment.  The  Dempster-Shafer  theory  they 
used  is  described  more  fully  in  Section  4. 

In  reference  [4],  Myler  considers  an  interesting  example  of  data  fusion  in  which 
Dempster-Shafer  theory  fails  to  give  an  acceptable  solution  to  a  data  fusion  problem 
where  it  is  used  to  fuse  two  irreconcilable  data  sets.  If  two  sensors  each  have  strongly 
differing  opinions  over  the  identity  of  an  emitter,  but  agree  very  very  weakly  on  a  third 
alternative,  then  Dempster-Shafer  theory  will  be  weighted  almost  100%  in  favour  of  that 
third  alternative.  This  is  an  odd  state  of  affairs,  but  one  to  which  there  appears  to  be  no 
easy  solution. 

Myler  accepts  this  and  instead  offers  a  measure  of  a  new  term  he  calls  “disfusion”: 
the  degree  to  which  there  is  agreement  among  sensors  as  to  an  alternative  identity  of  the 
target  that  has  not  been  chosen  as  the  most  likely  one.  If  D  is  the  number  of  dissenting 
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sensors  that  disagree  with  the  winning  sensor,  but  agree  with  each  other,  and  TV  is  the 
total  number  of  sensors  fused,  then  the  disfusion  is  defined  as 


Disfusion  = 


D 

TV-  1 


(2.1) 


Thus  if  all  but  one  sensor  weakly  identify  the  target  as  some  X ,  while  the  winning  sensor 
identifies  it  as  Y  ^  X,  then  D  =  TV  —  1  and  there  is  100%  disfusion.  Myler  contrasts 
this  with  “confusion”,  in  which  none  of  the  sensors  agree  with  any  other.  Clearly  though, 
there  are  other  definitions  of  such  a  concept  that  might  be  more  useful  in  characterising 
how  many  sensors  disagree,  and  whether  they  are  split  into  more  than  one  camp. 

However,  Myler’s  paper  gives  no  quantitative  use  for  disfusion,  apart  from  advocating 
its  use  as  a  parameter  that  should  prompt  a  set  of  sensors  to  take  more  measurements  if 
the  disfusion  is  excessive.  This  is  certainly  a  good  use  for  it,  since  we  need  to  be  aware 
that  the  high  mass  that  Dempster-Shafer  will  attribute  to  an  otherwise  weak  choice  of 
target  in  the  above  example  does  not  mean  that  Dempster-Shafer  is  succeeding  in  fusing 
the  data  correctly;  and  there  needs  to  be  an  indicator  built  in  to  the  fusion  system  to  warn 
us  of  that. 


Kokar  et  al.  [5]  bemoan  the  fact  that  at  their  time  of  writing  (early  2000),  data  fusion 
had  not  lived  up  to  its  promises.  They  suggest  that  it  needs  to  be  approached  somewhat 
differently  to  the  current  way,  and  have  described  various  models  that  might  provide  a 
way  forward.  Their  main  suggestion  is  that  a  data  fusion  system  should  not  be  thought 
of  so  much  as  a  separate  system  that  humans  use  to  fuse  data,  but  that  rather  we  should 
be  designing  a  complete  human-automaton  system  with  data-fusion  capability  in  mind. 

This  reference  concentrates  on  describing  various  models  for  ways  to  accomplish  this. 
The  authors  first  describe  a  generic  information-centred  model  that  revolves  around  the 
flow  of  information  in  a  system.  Its  highest  levels  are  dealing  with  sensor  data,  down  to 
the  preliminary  results  of  signal  processing,  through  to  extraction  of  relevant  details  from 
these,  prediction  of  their  states,  and  using  these  to  assess  a  situation  and  plan  a  response. 
These  levels  are  as  described  in  the  Joint  Directors  of  Laboratories  model  on  page  2  of 
this  report. 

Kokar’s  paper  next  describes  a  function-centred  model.  This  is  a  cycle  made  up  of  four 
processes  that  happen  in  temporal  sequence:  collecting  information,  collating  and  sorting 
it  to  isolate  the  relevant  parts,  making  a  decision,  and  finally  carrying  out  that  decision. 
The  results  of  this  then  influence  the  environment,  which  in  turn  produces  more  data 
for  the  cycle  to  begin  anew.  This  model  leads  on  quite  naturally  to  an  object-oriented 
approach,  since  it  implies  a  need  for  objects  to  carry  out  these  activities.  The  strength 
of  this  object-oriented  approach  is  that  it  has  the  potential  to  make  the  code-writing 
implementation  much  easier. 

Kokar  et  al.  emphasise  the  view  that  in  many  data  fusion  systems  humans  must  interact 
with  computers,  so  that  the  ways  in  which  the  various  processes  are  realised  need  to  take 
human  psychology  into  account. 

The  three  main  methods  of  data  fusion  are  compared  in  [6].  In  this  paper,  Cremer 
et  al.  use  Dempster-Shafer,  Bayes  and  fuzzy  logic  to  compare  different  approaches  to  land 
mine  detection.  Their  aim  is  to  provide  a  figure  of  merit  for  each  square  in  a  gridded  map 
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of  the  mined  area,  where  this  number  is  an  indicator  of  the  chance  that  a  mine  will  be 
found  within  that  grid  square. 

Each  technique  has  its  own  requirements  and  difficulty  of  interpretation.  For  example, 
Dempster-Shafer  and  Bayes  require  a  meaning  to  be  given  to  a  detection  involving  back¬ 
ground  noise.  We  can  use  a  mass  assigned  to  the  background  as  either  a  rejection  of  the 
background,  or  as  an  uncertainty.  The  fuzzy  approach  has  its  difficulty  of  interpretation 
when  we  come  to  “defuzzify”  its  results:  its  fuzzy  probabilities  must  be  turned  into  crisp 
ones  to  provide  a  bottom  line  figure  of  merit. 

Cremer  et  al.  do  not  have  real  mine  data,  so  rely  instead  on  a  synthetic  data  set.  They 
find  that  Dempster-Shafer  and  Bayes  approaches  outperform  the  fuzzy  approach — except 
for  low  detection  rates,  where  fuzzy  probabilities  have  the  edge.  Comparing  Dempster- 
Shafer  and  Bayes,  they  find  that  there  is  little  to  decide  between  the  two,  although 
Dempster-Shafer  has  a  slight  advantage  over  Bayes. 


2.3  Target  Location  and  Tracking 

Sensor  fusion  currently  finds  its  greatest  number  of  applications  in  the  location  and 
tracking  of  targets,  and  in  that  sense  it  is  probably  still  seen  very  much  as  a  military 
technique  that  is  gradually  finding  wider  application. 

Triesch  [7]  describes  a  system  for  tracking  the  face  of  a  person  who  enters  a  room 
and  manoeuvres  within  it,  or  even  walks  past  another  person  in  that  room.  The  method 
does  not  appear  to  use  any  standard  theory  such  as  Bayes  or  Dempster-Shafer.  Triesch 
builds  a  sequence  of  images  of  the  entire  room,  analysing  each  through  various  cues  such  as 
intensity  profile,  colour  and  motion  continuity.  To  each  metric  are  assigned  a  “reliability” 
and  a  “quality” ,  both  between  zero  and  one,  and  set  to  arbitrary  values  to  begin  with.  The 
data  fusion  algorithm  is  designed  so  that  their  values  evolve  from  image  to  image  in  such 
a  way  that  poorer  metrics  are  given  smaller  values  of  reliability,  and  so  are  weighted  less. 
Two-dimensional  functions  of  the  environment  are  then  produced,  one  for  each  cue,  where 
the  function’s  value  increases  in  regions  where  the  face  is  predicted  to  be.  A  sum  of  these 
functions,  weighted  with  the  reliabilities,  then  produces  a  sort  of  probability  distribution 
for  the  position  of  the  face. 

Each  cue  has  a  “prototype  vector” :  a  representation  of  the  face  in  the  parameter  space 
of  that  cue.  This  prototype  is  allowed  to  evolve  in  such  a  way  as  to  minimise  discordance 
in  the  cues’  outputs.  The  rate  of  evolution  of  the  prototype  is  determined  by  comparing 
the  latest  data  with  the  current  value  of  the  prototype  vector,  as  well  as  incorporating  a 
preset  time  constant  to  add  some  memory  ability  to  the  system’s  evolution. 

The  results  quoted  by  Triesch  are  spread  across  different  regimes  and  cannot  be  de¬ 
scribed  as  conclusive.  Although  higher  success  rates  are  achieved  when  implementing  their 
algorithm,  the  highest  success  occurs  when  the  quality  of  each  cue  is  constrained  to  be 
constant.  Allowing  this  quality  itself  to  evolve  might  be  expected  to  give  better  results, 
but  in  fact  it  does  not.  Triesch  posits  that  the  reason  for  this  anomalous  result  is  that 
the  dynamics  of  the  situation,  based  as  they  are  on  a  sequence  of  images,  are  not  as  con¬ 
tinuous  as  they  were  assumed  to  be  when  the  rules  governing  the  system’s  evolution  were 
originally  constructed.  He  suggests  that  more  work  is  needed  to  investigate  this  problem. 
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Schwartz  [8]  has  applied  a  maximum  a  posteriori  (MAP)  approach  to  the  search  for 
formations  of  targets  in  a  region,  using  a  model  of  a  battlefield  populated  by  a  formation 
of  vehicles.  A  snapshot  taken  of  this  battlefield  yields  a  map  which  is  then  divided  into  a 
grid,  populated  by  spots  that  might  indicate  a  vehicle— or  might  just  be  noise.  He  starts 
with  a  set  of  templates  that  describe  what  a  typical  formation  might  look  like  (based  on 
previously  collected  data  about  such  formations).  Each  of  these  templates  is  then  fitted 
digitally  over  the  grid  and  moved  around  cell  by  cell,  while  a  count  is  kept  of  the  number 
of  spots  in  each  cell.  By  comparing  the  location  of  each  spot  in  the  area  delineated  by  the 
template  to  the  centroid  of  the  spots  in  that  template,  it  becomes  possible  to  establish 
whether  a  particularly  high  density  of  spots  might  be  a  formation  conforming  to  the 
template,  or  might  instead  just  be  a  random  set  of  elements  in  the  environment  together 
with  noise,  that  has  no  concerted  motion. 

The  MAP  approach  to  searching  for  formations  uses  the  Bayesian  expression: 

p(formation  |  data)  =  gW  formation)  p(formation)  ^ 

As  mentioned  in  Section  3  (page  12),  the  MAP  estimate  of  the  degree  to  which  a  data 
set  is  thought  to  be  a  formation  is  the  value  of  a  parameter  characterising  the  formation, 
that  maximises  p(formation |  data).  As  is  typical  of  Bayesian  problems,  the  value  of  the 
prior  p(formation)  at  best  can  only  be  taken  to  be  some  constant.  Schwartz  discusses 
statistical  models  for  the  placing  of  spots  in  the  grid.  His  method  does  not  involve  any 
sort  of  evolution  of  parameters;  rather  it  is  simply  a  comparison  of  spot  number  with 
template  shapes.  Good  quality  results  are  had  with— and  require — many  frames;  but  this 
is  not  overly  surprising,  since  averaging  over  many  frames  will  reduce  the  amount  of  noise 
on  the  grid. 

Fuzzy  logic  is  another  method  that  has  been  used  to  fuse  data.  This  revolves  around 
the  idea  of  a  “membership  function”.  Membership  in  a  “crisp”  set  (i.e.  the  usual  type  of 
set  encountered  in  mathematics)  is  of  course  a  binary  yes/no  value;  and  this  notion  of  a 
one  or  zero  membership  value  generalises  in  fuzzy  set  theory  to  a  number  that  lies  between 
one  and  zero,  that  defines  the  set  by  how  well  the  element  is  deemed  to  lie  within  it. 

These  ideas  are  applied  by  Simard  et  al.  [9]  of  Lockheed  Martin  Canada  and  the 
Canadian  Defence  Research  Establishment,  along  with  a  combination  of  other  fusion  tech¬ 
niques,  to  ship  movements  in  order  to  build  a  picture  of  what  vessels  are  moving  in 
Canadian  waters.  The  system  they  described  as  of  1999  is  termed  the  Adaptive  Fuzzy 
Logic  Correlator  (AFLC). 

The  AFLC  system  receives  messages  in  different  protocols  relating  to  various  contacts 
made,  by  both  ground  and  airborne  radars.  It  then  runs  a  Kalman  filter  to  build  a  set 
of  tracks  of  the  various  ships.  In  order  to  associate  further  contacts  with  known  tracks, 
it  needs  to  prepare  values  of  the  membership  functions  for  electromagnetic  and  position 
parameters.  For  example,  given  a  new  contact,  it  needs  to  decide  whether  this  might 
belong  to  an  already-existing  track,  by  looking  at  the  distance  between  the  new  contact 
and  the  track.  Of  course,  a  distance  of  zero  strongly  implies  that  the  contact  belongs  with 
the  track,  so  we  can  see  that  the  contact  can  be  an  element  of  a  fuzzy  set  associated  with 
the  track,  where  the  membership  function  should  peak  for  a  distance  of  zero. 


6 


DSTO-TR-1 436 


Given  surveillance  data  and  having  drawn  various  tracks  from  it,  the  system  must  then 
consult  a  database  of  known  ships  to  produce  a  candidate  that  could  conceivably  have 
produced  the  track  of  interest.  Electromagnetic  data,  such  as  pulse  repetition  frequency, 
can  also  be  given  a  membership  within  different  sets  of  emitters.  The  ideas  of  fuzzy 
sets  then  dictate  what  credence  we  give  to  the  information  supplied  by  various  radar  or 
surveillance  systems.  Comparing  this  information  for  many  sensors  reduces  to  comparing 
the  membership  function  values  for  the  various  system  parameters. 

Once  we  have  a  candidate  ship  for  any  given  track,  we  need  to  fuse  incoming  data  by 
combining  it  with  the  data  that  already  forms  part  of  the  track  history.  For  example,  the 
AFLC  takes  the  last  ten  contacts  made  and  forms  the  track  history  from  these.  Finally, 
the  output  of  the  AFLC  is  a  map  of  the  region  of  interest  filled  with  tracks  of  ships, 
together  with  their  identifications  if  these  can  be  found  in  the  ship  database. 

As  the  authors  point  out,  the  use  of  fuzzy  logic  is  not  without  its  problems  when 
comparing  different  parameters.  The  membership  function  quantifying  how  close  a  new 
contact  is  to  a  track  is  not  related  to  the  membership  function  for  say  pulse  repetition 
frequency,  and  yet  these  two  functions  may  well  need  to  be  compared  at  some  point.  This 
comparison  of  apples  with  oranges  is  a  difficulty,  and  highlights  the  care  that  we  need  to 
exercise  when  defining  just  what  the  various  membership  functions  should  be. 

Kewley  [10]  compares  the  Dempster-Shafer  and  fuzzy  approaches  to  fusion,  so  as  to 
decide  which  of  a  given  set  of  emitters  has  produced  certain  identity  attribute  data.  He 
finds  that  fuzzy  logic  gives  similar  results  to  Dempster-Shafer,  but  for  less  numerical  work 
and  complexity.  Kewley  also  notes  that  while  the  Dempster-Shafer  approach  is  not  easily 
able  to  assimilate  additional  emitters  after  its  first  calculations  have  been  done,  fuzzy  logic 
certainly  can. 


It’s  not  apparent  that  there  is  any  one  approach  wTe  should  take  to  fuse  track  data 
from  multiple  sensors.  In  reference  [11],  Watson  et  al.  discuss  one  solution  they  have 
developed:  the  Optimal  Asynchronous  Track  Fusion  Algorithm  (OATFA).  They  use  this  to 
study  the  tracking  of  a  target  that  follows  three  constant  velocity  legs  with  two  changes  of 
direction  in  between,  leading  to  its  travelling  in  the  opposite  direction  to  which  it  started. 

The  authors  base  their  technique  on  the  Interacting  Multiple  Model  algorithm  (IMM). 
The  IMM  is  described  as  being  particularly  useful  for  tracking  targets  through  arbitrary 
manoeuvres,  but  traditionally  it  uses  a  Kalman  filter  to  do  its  processing.  Watson  et  al. 
suggest  replacing  the  IMM’s  Kalman  filter  with  their  OATFA  algorithm  (which  contains 
several  Kalman  filters  of  its  own),  since  doing  so  produces  better  results  than  for  the 
straight  Kalman  filter  case.  They  note,  however,  that  this  increase  in  quality  tends  to  be 
confined  to  the  (less  interesting)  regions  of  constant  velocity. 

The  OATFA  algorithm  treats  each  sensor  separately:  passing  the  output  from  each  to 
a  dedicated  Kalman  filter,  that  delivers  its  updated  estimate  to  be  combined  with  those  of 
all  of  the  other  sensor/Kalman  filter  pairs,  as  well  as  feeding  back  to  each  of  the  Kalman 
filters. 


Certainly  the  OATFA  model  departs  from  the  idea  that  the  best  way  to  fuse  data  is 
to  deliver  it  all  to  a  central  fusion  engine:  instead,  it  works  upon  each  sensor  separately. 
Typical  results  of  the  IMM-OATFA  algorithm  tend  to  show  position  estimation  errors  that 
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are  about  half  those  that  the  conventional  IMM  produces,  but  space  and  time  constraints 
make  it  impossible  for  the  authors  to  compare  their  results  with  any  other  techniques. 

Hatch  et  al.  [12]  describe  a  network  of  underwater  sensors  used  for  tracking.  The 
overall  architecture  is  that  of  a  command  centre  taking  in  information  at  radio  frequency, 
from  a  sublevel  of  “gateway”  nodes.  These  in  turn  each  take  their  data  acoustically  from 
the  next  sublevel  of  “master”  nodes.  The  master  nodes  are  connected  (presumably  by 
wires)  to  sensors  sitting  on  the  ocean  floor. 

The  communication  between  command  centre  and  sensors  is  very  much  a  two-way 
affair.  The  sensors  process  and  fuse  some  of  their  data  locally,  passing  the  results  up  the 
chain  to  the  command  centre.  But  because  the  sensors  run  on  limited  battery  power,  the 
command  centre  must  be  very  careful  with  allocating  them  tasks.  Thus,  it  sets  the  status 
of  each  (“process  data”,  “relay  it  only  up  the  chain”,  “sleep”  or  “die”)  depending  on  how 
much  power  each  has.  The  command  centre  also  raises  or  lowers  detection  thresholds  in 
order  to  maintain  a  constant  false  alarm  rate  over  the  whole  field;  so  that  if  a  target  is 
known  to  be  in  one  region,  then  thresholds  can  be  lowered  for  sensors  in  that  region  (to 
maximise  detection  probabilities),  while  being  raised  in  other  areas  to  keep  the  false  alarm 
rate  constant. 

The  processing  for  the  sensors  is  done  using  both  Kalman  filtering  and  a  fuzzy  logic- 
based  a-/ 3  filter  (with  comparable  results  at  less  computational  cost  for  the  a- ft  filter). 
Fuzzy  logic  is  also  used  to  adapt  the  amount  of  process  noise  used  by  the  Kalman  filter  to 
account  for  target  manoeuvres. 

The  paper  gives  a  broad  overview  of  the  processing  hierarchy  without  mentioning 
mathematical  details.  Rather,  it  tends  to  concentrate  more  on  the  architecture,  such  as 
the  necessity  for  a  two-way  data  flow  as  mentioned  above. 


2.4  Satellite  Positioning 


Heifetz  et  al.  [13]  describe  a  typical  problem  involved  with  satellite-attitude  mea¬ 
surement.  They  are  dealing  with  the  NASA  Gravity  Probe  B,  that  was  designed  to  be 
put  into  Earth  orbit  for  a  year  or  more  in  a  precision  measurement  of  some  relativistic 
effects  that  make  themselves  felt  by  changes  in  the  satellite’s  attitude. 

Their  work  is  based  around  a  Kalman  filter,  but  the  nonlinearities  involved  mean  that 
at  the  very  least,  an  extended  Kalman  filter  is  required.  Unfortunately,  the  linearisation 
used  in  the  extended  Kalman  filter  introduces  a  well-understood  bias  into  two  of  the 
variables  being  measured.  The  authors  are  able  to  circumvent  this  difficulty  by  using  a 
new  algorithm  [14],  that  breaks  the  filtering  into  two  steps:  a  Kalman  filter  and  a  Gauss- 
Newton  algorithm. 

The  first  step,  the  Kalman  filter,  is  applied  by  writing  trigonometric  entities  such  as 
sin(wf  +  <5)  in  terms  of  their  separate  sin  u>t,  cos  ut,  sin  6,  cos  5  constituents.  Combinations 
of  some  of  these  constituents  then  form  new  variables,  so  that  the  nonlinear  measurement 
equation  becomes  linear  in  those  variables.  Thus  a  linear  Kalman  filter  can  be  applied, 
and  the  state  estimate  it  produces  is  then  taken  as  a  synthetic  new  measurement,  to  be 
fed  to  the  Gauss-Newton  iterator. 
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Although  the  paper  was  written  before  NASA’s  satellite  was  due  for  launch,  the  authors 
have  plotted  potentially  achievable  accuracies  which  show  that  in  principle,  the  expected 
relative  errors  should  be  very  small. 


2.5  Air  Surveillance 

In  [15],  Rodriguez  et  al.  discuss  a  proposal  to  fuse  data  in  an  air  surveillance  system. 
They  describe  a  system  whose  centre  is  the  Automatic  Dependent  Surveillance  system, 
in  which  participating  aircraft  send  their  navigation  details  to  Air  Traffic  Control  for 
assistance  in  marshalling. 

Since  the  proposed  scheme  uses  a  central  control  centre  for  fusion,  it  provides  a  good 
example  of  an  attempt  to  fuse  data  in  the  way  that  preserves  each  sensor’s  individuality  for 
as  long  as  possible,  which  thus  should  lead  to  the  best  results.  Air  Traffic  Control  accepts 
each  Automatic  Dependent  Surveillance  system  message  and  tries  to  associate  it  with  an 
existing  track.  It  doesn’t  do  this  on  a  message- by-message  basis,  but  rather  listens  for 
some  preset  period,  accumulating  the  incoming  data  that  arrives  during  this  time.  Once 
it  has  a  collection  of  data  sets,  it  updates  its  information  iteratively,  by  comparing  these 
data  sets  with  already-established  tracks. 


2.6  Image  Processing  and  Medical  Applications 

By  applying  information  theory,  Cooper  and  Miller  [16]  address  the  problem  of 
quantifying  the  efficacy  of  automatic  object  recognition.  They  begin  with  a  library  of 
templates  that  can  be  referenced  to  identify  objects,  with  departures  of  an  object’s  pose 
from  a  close  match  in  this  library  being  quantified  by  a  transformation  of  that  template. 
They  require  a  metric  specifying  how  well  a  given  object  corresponds  to  some  template, 
regardless  of  that  object’s  orientation  in  space. 

This  is  done  by  means  of  “mutual  information” .  They  begin  with  the  usual  measures 
of  entropy  S(x),  S(y)  and  joint  entropy  S(x,  y)  in  terms  of  expected  values: 

S(x)  =  — £^x[lnp(x)] 

S(x,y)  =  -ExEy[lnp{x,y)]  .  (2.3) 

Using  these,  the  mutual  information  of  x  and  y  is  defined  as 

I {x,  y)  =  S{x)  +  S {y)  -  S(x,  y)  .  (2.4) 

If  two  random  variables  are  independent,  then  their  joint  entropy  is  just  the  sum  of  their 
individual  entropies,  so  that  their  mutual  information  is  zero  as  expected.  On  the  other 
hand,  if  they  are  highly  matched,  their  mutual  information  is  also  high.  The  core  of 
Cooper  and  Miller’s  paper  is  their  calculation  of  the  mutual  information  for  three  scenarios: 
two  different  sorts  of  visual  mapping  (orthographic  and  perspective  projections),  and  the 
fusion  of  these.  That  is,  they  calculate  the  mutual  information  for  three  pairs  of  variables: 
one  element  of  each  pair  being  the  selected  template,  and  the  other  element  being  the 
orthographic  projection,  the  perspective  projection,  and  the  fusion  of  the  two  projections. 
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For  very  low  signal  to  noise  ratios  (SNRs),  all  three  mutual  informations  are  zero, 
meaning  there  is  very  little  success  in  the  object-template  fits.  All  three  informations 
climb  as  the  SNR  increases,  tending  toward  a  common  upper  limit  of  about  6.5  for  the 
highest  SNR  values.  The  middle  of  the  SNR  range  is  where  we  see  the  interesting  results. 
As  hoped  for,  here  the  fused  scenario  gives  the  highest  mutual  information.  Typical  values 
in  the  middle  of  the  SNR  range  (SNR  =  10)  are  orthographic  projection:  3.0,  perspective 
projection:  3.8  and  fused  combination:  4.6. 

Similar  work  has  been  done  by  Viola  and  Gilles  [17],  who  fuse  image  data  by  max¬ 
imising  the  mutual  information.  In  contrast  to  Cooper  and  Miller’s  work,  they  match 
different  images  of  the  same  scene,  where  one  might  be  rotated,  out  of  focus  or  even 
chopped  up  into  several  dozen  smaller  squares.  They  achieve  good  results,  and  report 
that  the  method  of  mutual  information  is  more  robust  than  competing  techniques  such  as 
cross-correlation. 

Fuzzy  logic  has  been  applied  to  image  processing  in  the  work  of  Debon  et  al.  [18], 
who  use  it  in  locating  the  sometimes  vague  elliptical  cross-section  of  the  human  aorta  in 
ultrasound  images.  The  situation  they  describe  is  that  of  an  ultrasound  source  lowered 
down  a  patient’s  oesophagus,  producing  very  noisy  data  that  shows  slices  of  the  chest 
cavity  perpendicular  to  the  spine.  The  noise  is  due  partly  to  the  instrument,  and  partly 
to  natural  chest  movements  of  the  patient  during  the  process.  Within  these  ultrasound 
slices  they  hope  to  find  an  ellipse  that  marks  the  aorta  in  cross-section. 

Rather  than  using  the  common  approach  of  collecting  and  fusing  data  from  many 
sensors,  Debon  et  al.  use  perhaps  just  one  sensor  that  collects  data,  which  is  then  fused 
with  prior  information  about  the  scene  being  analysed.  In  this  case  the  authors  are  using 
textbook  information  about  the  usual  position  of  the  aorta  (since  this  is  not  likely  to  vary 
from  patient  to  patient) .  This  is  an  entirely  reasonable  thing  to  do,  given  that  the  same 
principle  of  accumulated  knowledge  is  perhaps  the  main  contributor  for  the  well  known 
fact  that  humans  tend  to  be  better,  albeit  slower,  than  computers  at  doing  certain  complex 
tasks. 

The  fuzzy  model  that  the  authors  use  allocates  four  fuzzy  sets  to  the  ultrasound  image. 
These  are  sets  of  numbers  allocated  to  each  pixel,  quantifying  for  example  brightness  and 
its  gradient  across  neighbouring  pixels.  They  then  use  these  numbers  in  the  so-called 
Hough  transform,  a  method  that  can  detect  parametrised  curves  within  a  set  of  points. 

The  result  of  this  fusion  of  library  images  of  the  aorta  with  actual  data  is  that  an 
ellipse  is  able  to  be  fitted  to  an  otherwise  vague  outline  of  the  aorta  in  the  ultrasound 
images.  Inspection  of  the  ultrasound  images  shows  that  this  technique  works  very  well. 

A  simpler  approach  to  medical  data  fusion  is  taken  by  Zachary  and  Iyengar  [19], 
who  describe  a  method  for  fusing  data  to  reconstruct  biological  surfaces.  They  are  dealing 
with  three  sets  of  data:  namely,  contour  slices  that  result  from  imaging  in  three  orthogonal 
planes.  This  is  relatively  new  work,  in  the  sense  that  medical  imaging  is  usually  done  in 
a  single  plane. 

Their  approach  to  the  problem  does  not  actually  analyse  how  well  they  are  fusing  the 
three  sets  of  data.  Their  major  effort  lies  in  defining  a  good  coordinate  system  within 
which  to  work,  as  well  as  giving  care  to  ensuring  that  the  sets  of  data  are  all  scaled  to 
match  each  other  correctly.  Although  the  resulting  surfaces  that  are  drawn  through  the 
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points  fit  well,  this  has  only  been  done  in  [19]  for  a  spherical  geometry.  However,  the 
authors  do  describe  having  applied  their  method  to  ellipsoids  and  to  some  medical  data. 


2.7  Intelligent  Internet  Agents 

Intelligent  internet  agents  are  also  discussed  in  the  literature,  although  somewhat  in¬ 
frequently.  In  reference  [20],  Stromberg  discusses  the  makeup  of  a  sensor  management 
system  in  terms  of  two  architectures:  agent  modelling  and  multi-level  sensor  management. 
His  approach  maintains  that  agents  can  be  useful  because,  as  an  extension  to  the  object 
oriented  approach  that  is  so  necessary  to  modern  programming,  they  allow  a  high  degree  of 
robustness  and  re-usability  in  a  system.  He  points  out  that  in  a  typical  tracking  problem, 
different  modes  of  operation  are  necessary:  fast  revisits  to  establish  a  candidate  track, 
with  variable  revisit  times  once  the  track  is  established.  Agents  are  seen  to  be  well  suited 
to  this  work,  since  they  can  be  left  alone  to  make  their  own  decisions  about  just  when  to 
make  an  observation. 


2.8  Business  and  Finance 

An  application  of  fusion  to  the  theory  of  finance  is  described  by  Blasch  [21].  He 
discusses  the  interaction  between  monetary  policy,  being  concerned  with  money  demand 
and  income,  and  fiscal  policy,  the  interaction  between  interest  rates  and  income.  The 
multiple  sensors  here  are  the  various  sources  of  information  that  the  government  uses 
to  determine  such  indicators  as  changes  in  interest  rates.  However,  these  sources  have 
differing  update  frequencies,  from  hourly  to  weekly  or  longer.  The  perceived  need  to 
update  markets  continually,  means  that  such  inputs  are  required  to  be  combined  in  a  way 
that  acknowledges  the  different  confidences  in  each. 

Blasch  quantifies  the  policies  using  a  model  with  added  Gaussian  noise  to  allow  the 
dynamics  to  be  approximated  linearly,  with  most  but  not  all  of  his  noise  being  white. 
Not  surprisingly,  he  uses  a  Kalman  filter  for  the  task,  together  with  wavelet  transforms 
introduced  because  of  the  different  resolution  levels  being  considered  (since  wavelets  were 
designed  to  analyse  models  with  different  levels  of  resolution).  An  appreciation  of  Blasch’s 
analysis  requires  a  good  understanding  of  fiscal  theory,  but  his  overall  conclusion  is  that 
the  Kalman  filter  has  served  the  model  very  well. 


3  Bayesian  Data  Fusion 

We  will  begin  our  presentation  of  Bayesian  Data  Fusion  by  first  reviewing  Bayes’ 
theorem.  To  simplify  the  expressions  that  follow,  we  shorten  the  notation  of  p(A)  for  the 
probability  of  some  event  A  occurring  to  just  (A):  the  “p”  is  so  ubiquitous  that  we  will 
leave  it  out  entirely.  Also,  the  probability  that  two  events  A,  B  occur  is  written  as  (A,  B ), 
and  this  can  be  related  to  the  probability  (A\B)  of  A  occurring  given  that  B  has  already 
occurred: 

(A,B)  =  (A\B)(B)  .  (3.1) 
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Now  since  ( A ,  B)  =  (B,  A),  we  have  immediately  that 

=  .  (,2) 

If  there  are  several  events  A i  that  are  distinguished  from  B  in  some  way,  then  the  denom¬ 
inator  (B)  acts  merely  as  a  normalisation,  so  that 


(A\B)  = 


(mw 

Ei(B\ Ai)(Ai) 


(3.3) 


Equations  (3.2)  or  (3.3)  are  known  as  Bayes’  rule,  and  are  very  fruitful  in  developing 
the  ideas  of  data  fusion.  As  we  said,  the  denominator  of  (3.3)  can  be  seen  as  a  simple 
normalisation;  alternatively,  the  fact  that  the  (B)  of  (3.2)  can  be  expanded  into  the  de¬ 
nominator  of  (3.3)  is  an  example  of  the  Chapman-Kolmogorov  identity  that  follows  from 
standard  statistical  theory: 


(i4|B)  =  £(A|X<,  B){Xi\B)  ,  (3.4) 

t 

which  we  use  repeatedly  in  the  calculations  of  this  report. 

Bayes  rule  divides  statisticians  over  the  idea  of  how  best  to  estimate  an  unknown 
parameter  from  a  set  of  data.  For  example,  we  might  wish  to  identify  an  aircraft  based 
on  a  set  of  measurements  of  useful  parameters,  so  that  from  this  data  set  we  must  extract 
the  “best”  value  of  some  quantity  x.  Two  important  estimates  of  this  best  value  of  x  are: 

Maximum  likelihood  estimate:  the  value  of  x  that  maximises  {data |x) 

Maximum  a  posteriori  estimate:  the  value  of  x  that  maximises  (x|  data) 


There  can  be  a  difference  between  these  two  estimates,  but  they  can  always  be  related 
using  Bayes’  rule. 

A  standard  difficulty  encountered  when  applying  Bayes’  theorem  is  in  supplying  values 
for  the  so-called  prior  probability  (A)  in  Equation  (3.3).  As  an  example,  suppose  several 
sensors  have  supplied  data  from  which  we  must  identify  a  target  aircraft.  From  (3.3),  the 

chance  that  the  aircraft  is  an  F-l  11  on  the  available  evidence  is 


(F-lll\data)  = 


_ {data\F-lll)  ( F-lll ) _ 

{data\F-lll)  ( F-l  11 )  +  {data\F/A-18)  ( F/A-18 )  +  . . . 


(3.5) 


It  may  well  be  easy  to  calculate  (data\F-l  11),  but  now  we  are  confronted  with  the  question: 
what  is  {F-lll),  ( F/A-18 )  etc.?  These  are  prior  probabilities:  the  chance  that  the  aircraft 
in  question  could  really  be  for  example  an  F-lll,  irrespective  of  what  data  has  been  taken. 
Perhaps  F-llls  are  not  known  to  fly  in  the  particular  area  in  which  we  are  collecting  data, 
in  which  case  [F-lll)  is  presumably  very  small. 

We  might  have  no  way  of  supplying  these  priors  initially,  so  that  in  the  absence  of  any 
information,  the  approach  that  is  most  often  taken  is  to  set  them  all  to  be  equal.  As  it 
happens,  when  Bayes  rule  is  part  of  an  iterative  scheme  these  priors  will  change  unequally 
on  each  iteration,  acquiring  more  meaningful  values  in  the  process. 
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3.1  Single  Sensor  Tracking 

As  a  first  example  of  data  fusion,  we  apply  Bayes’  rule  to  tracking.  Single  sensor 
tracking,  also  known  as  filtering,  involves  a  combining  of  successive  measurements  of  the 
state  of  a  system,  and  as  such  it  can  be  thought  of  as  a  fusing  of  data  from  a  single  sensor 
over  time  as  opposed  to  sensor  set ,  which  we  leave  for  the  next  section.  Suppose  then  that 
a  sensor  is  tracking  a  target,  and  makes  observations  of  the  target  at  various  intervals. 
Define  the  following  terms: 

Xk  —  target  state  at  “time”  k  (iteration  number  k) 

yk  =  observation  made  of  target  at  time  k 

Yk  =  set  of  all  observations  made  of  target  up  to  time  k 

=  {yi,V2,---,yk}  ■  (3-6) 

The  fundamental  problem  to  be  solved  is  to  find  the  new  estimate  of  the  target  state  (xk\Yk) 
given  the  old  estimate  (xk-i \Yk-i).  That  is,  we  require  the  probability  that  the  target 
is  something  specific  given  the  latest  measurement  and  all  previous  measurements,  given 
that  we  know  the  corresponding  probability  one  time  step  back.  To  apply  Bayes’  rule  for 
the  set  Yk,  we  separate  the  latest  measurement  yk  from  the  rest  of  the  set  Yk- 1 — since  Yk-\ 
has  already  been  used  in  the  previous  iteration — to  write  ( xk\Yk )  as  (xk\yk,  lfc_i).  We  shall 
swap  the  two  terms  Xk,  yk  using  a  minor  generalisation  of  Bayes’  rule.  This  generalisation 
is  easily  shown  by  equating  the  probabilities  for  the  three  events  (A,  J5,  C)  and  (B,  A,  C), 
expressed  using  conditionals  as  in  Equation  (3.1): 


(A,B,C)  =  (A\B,C)  (B\C)  (C)  ; 

(3.7) 

(B,A,C)  =  (B\A,C)(A\C)(C)  ; 

(3.8) 

so  that  Bayes’  rule  becomes 

(B\A,C)(A\C) 

(a\b,o~  (b|c) 

(3.9) 

Before  proceeding,  we  note  that  since  only  the  latest  time  k  and  the  next  latest  fc— 1  appear 
in  the  following  expressions,  we  can  simplify  them  by  replacing  k  with  1  and  k  —  1  with  0. 
So  we  write 


“conditional  density” 

(aqlYi)  =  (xx|x/x,y0)  = 


“likelihood”  “predicted  density” 

(</i  l*o) 


normalisation 


There  are  three  terms  in  this  equation,  and  we  consider  each  in  turn. 

The  likelihood  deals  with  the  probability  of  a  measurement  y\ .  We  will  assume  the  noise 
is  “white”,  meaning  uncorrelated  in  time,1  so  that  the  latest  measurement  does  not  depend 
on  previous  measurements.  In  that  case  the  likelihood  (and  hence  normalisation)  can  be 
simplified: 

likelihood  =  (yx|xx,y0)  =  (2/1  |xx)  .  (3.11) 

lSuch  noise  is  called  white  because  a  Fourier  expansion  must  yield  equal  amounts  of  all  frequencies. 


13 


DSTO-TR-1436 


The  predicted  density  predicts  x*  based  on  old  data.  It  can  be  expanded  using  the 
Chapman-Kolmogorov  identity: 

result  from  previous  iteration  (“prior”) 

predicted  density  =  (xi|F0)  =  J  dx0  {xi|xo,y0)  (^j^)  .  (3.12) 

“transition  density” 

We  will  also  assume  the  system  obeys  a  Markov  evolution,  implying  that  its  current  state 
directly  depends  only  on  its  previous  state,  with  any  dependence  on  old  measurements 
encapsulated  in  that  previous  state.  Thus  the  transition  density  in  (3.12)  can  be  simplified 
to  (xi|x0),  changing  that  equation  to 

predicted  density  =  (xi|y0)  =  J  dx 0  (xi\x0)  (x0|>o)  •  (3.13) 

Lastly,  the  normalisation  can  be  expanded  by  way  of  Chapman-Kolmogorov,  using  the 
now-simplified  likelihood  and  the  predicted  density: 

normalisation  =  (yi|y0)  =  J  dx  i  {yx\xuY0)  (xi|y0)  -  J  dxk  (yi|xO  (xi|y0)  .  (3.14) 

Finally  then,  Equation  (3.10)  relates  (xi|Yi)  to  (x0|y0)  via  Equations  (3.11)-(3.14),  and 
our  problem  is  solved. 


An  Example:  Deriving  the  Kalman  Filter 

As  noted  above,  the  Kalman  filter  is  an  example  of  combining  data  over  time  as  opposed 
to  sensor  number.  Bayes’  rule  gives  a  very  accessible  derivation  of  it  based  on  the  preceding 
equations.  Our  analysis  actually  requires  two  matrix  theorems  given  in  Appendix  A.  These 
theorems  are  reasonable  in  that  they  express  Gaussian  behaviour  that’s  familiar  in  the  one 
dimensional  case.  Refer  to  Appendix  A  to  define  the  notation  N(x-,y,P)  that  we  use. 

In  particular,  Equation  (A5)  gives  a  direct  method  for  calculating  the  predicted  proba¬ 
bility  density  in  Equation  (3.13),  which  then  allows  us  to  use  the  Bayesian  framework  [22] 
to  derive  the  Kalman  filter  equation.  A  derivation  of  the  Kalman  filter  based  on  Bayesian 
belief  networks  was  proposed  recently  in  [23].  However,  in  both  these  papers  the  authors 
do  not  solve  for  the  predicted  density  (3.13)  directly.  They  implicitly  use  a  “sum  of  two 
Gaussian  random  variables  is  a  Gaussian  random  variable”  argument  to  solve  for  the  pre¬ 
dicted  density.  While  alternative  methods  for  obtaining  this  density  by  using  characteristic 
functions  exist  in  the  literature,  we  consider  a  direct  solution  of  the  Chapman-Kolmogorov 
equation  as  a  basis  for  the  predicted  density  function.  This  approach  is  more  general  and 
is  the  basis  of  many  advanced  filters,  such  as  particle  filters.  In  a  linear  Gaussian  case,  we 
will  show  that  the  solution  of  the  Chapman-Kolmogorov  equation  reduces  to  the  Kalman 
predictor  equation.  To  the  best  of  our  knowledge,  this  is  an  original  derivation  of  the 
prediction  integral,  Equation  (3.22). 

First,  assume  that  the  target  is  unique,  and  that  the  sensor  is  always  able  to  detect  it. 
The  problem  to  be  solved  is:  given  a  set  Yk  of  measurements  up  until  the  current  time  k, 
estimate  the  current  state  xk\  this  estimate  is  called  xk\k  in  the  literature,  to  distinguish 
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it  from  Xfc|fc_l5  the  estimate  of  x^  given  measurements  up  until  time  k  —  1.  Further,  as 
above  we  will  simplify  the  notation  by  replacing  k  —  1  and  k  with  0  and  1  respectively.  So 
begin  with  the  expected  value  of  xi: 


®i|i  =  J  dx i  xi(xijYi)  . 


From  Equations  (3.10,  3.11)  we  can  write  the  conditional  density  (xi |Yi)  as 


(asilH) 


We  need  the  following  quantities: 


likelihood  predicted  density 

(f/ljgl)  (^i|Yq) 

{yi\Yo) 


normalisation 


Likelihood  (yi|xi):  This  is  derived  from  the  measurement  dynamics,  assumed  linear: 

yi  =  Hx  i  +  ioi  ,  (3.17) 

where  Wy  is  a  noise  term,  assumed  Gaussian  with  zero  mean  and  covariance  Ry .  Given  X\ , 
the  probability  of  obtaining  a  measurement  yy  must  be  equal  to  the  probability  of  obtaining 
the  noise  wy. 

(yi\xi)  =  (wi)  =  {yi~Hxi)  =  N(y1-Hx1;0,Ri)  =  N(yi;Hxi,Rl)  .  (3.18) 


Predicted  density  (xi|Yo):  Using  (3.13),  we  need  the  transition  density  (xi|xo)  and  the 
prior  (xo|Yo).  The  transition  density  results  from  the  system  dynamics  (assumed  linear): 

xi  =  Fx o  +  v\  +  perhaps  some  constant  term  ,  (3.19) 

where  v\  is  a  noise  term  that  reflects  uncertainty  in  the  dynamical  model,  again  assumed 
Gaussian  with  zero  mean  and  covariance  Q\.  Then  just  as  for  the  likelihood,  we  can  write 

(xi|x0)  =  (ui)  =  (xi  -  Fx0)  =  N(xy -Fxo-,0,Qi)  —  N(xi;Fx0,Qi)  .  (3.20) 

The  prior  is  also  assumed  to  be  Gaussian: 

(x0|Yo)  =  iV(x0;x0|o,Fb|o)  •  (3-21) 

Thus  from  (3.13)  the  predicted  density  is 

(xi|Y0)  =  J dx0  N(xi;Fx0,Qi)  N(x0]Xo\o,Po\o) 

(i5)  iV(xi;x1|0,Pi|0)  ,  (3.22) 

where 


Xi\o  =  Fx0{0  , 

Pi|o  =  FPq\qFt  +  Qi  . 


(3.23) 
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Normalisation  (j/i  |Vo):  This  is  an  integral  over  quantities  that  we  have  already  dealt 
with: 

(2/1 1*0 )  =  J  dxi  (yi|xi)(xi|Yb) 

(3.18)  (3.22) 

=  J dx  1  N(yi;Hxi,Ri)  N(x  1; xqo,  Pll0) 

(AS) 

=  ,  (3.24) 

where 

Si  =  HP^qH7  +  i?i  .  (3.25) 

Putting  it  all  together,  the  conditional  density  can  now  be  constructed  through  Equa¬ 
tions  (3.16,  3.18,  3.22,  3.24): 


(xi|Vi)  - 
(A3) 


Njy^HxuRj)  N(xi;x1\q,P1i0) 

N(yi-,Hx^0,  Si) 

Nix^XuP^)  , 


(3.26) 


where 


K  =  Pi,o^T  {HPll0HT  +  fli)  1  (used  in  next  lines) 

-^1  =  X!|0  +  K(yi  -  Hxi |0) 

PiU  =  (1  -  KH)  Pi,0  •  (3.27) 

Finally,  we  must  calculate  the  integral  in  (3.15)  to  find  the  estimate  of  the  current  state 
given  the  very  latest  measurement: 

xi|i  =  J dxx  xiJV(xi;Ai,P1|1)  =  X\  ,  (3.28) 

a  result  that  follows  trivially,  since  it  is  just  the  calculation  of  the  mean  of  the  normal 
distribution,  and  that  is  plainly  X\. 

This  then,  is  the  Kalman  filter.  Starting  with  x0|0,  Po|o  (which  must  be  estimated  at  the 
beginning  of  the  iterations),  and  QX,RX  (really  Qk ,  Rk  for  all  Jfc),  we  can  then  calculate  x1{1 
by  applying  the  following  equations  in  order,  which  have  been  singled  out  in  the  best  order 
of  evaluation  from  (3.23,  3.27,  3.28): 

Pi|o  —  PPo|oPT  +  Qi 

K  =  P1]0Ht  {HPi10Ht  +  RJ-1 
PiU  =  (1  ~KH)P1]0 
xqo  =  Fx0{0 

xqi  =  xqo  +  K(yi  -Hxll0)  (3.29) 

The  procedure  is  iterative,  so  that  the  latest  estimates  Xi|1,P1|1  become  the  old  esti¬ 
mates  x0|0,P )|o  in  the  next  iteration,  which  always  incorporates  the  latest  data  y\ .  This 
is  a  good  example  of  applying  the  Bayesian  approach  to  a  tracking  problem,  where  only 
one  sensor  is  involved. 


16 


DSTO-TR-1436 


Figure  1:  Different  types  of  data  fusion:  centralised  (top),  centralised  with 
preprocessing  done  at  each  sensor  (middle),  and  a  hybrid  of  the  two  (bottom) 
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3.2  Fusing  Data  From  Several  Sensors 


Figure  1  depicts  a  sampling  of  ways  to  fuse  data  from  several  sensors.  Centralising  the 
fusion  combines  all  of  the  raw  data  from  the  sensors  in  one  main  processor.  In  principle 
this  is  the  best  way  to  fuse  data  in  the  sense  that  nothing  has  been  lost  in  preprocessing; 
but  in  practice  centralised  fusion  leads  to  a  huge  amount  of  data  traversing  the  network, 
which  is  not  necessarily  practical  or  desirable.  Preprocessing  the  data  at  each  sensor 
reduces  the  amount  of  data  flow  needed,  while  in  practice  the  best  setup  might  well  be  a 
hybrid  of  these  two  types. 

Bayes  rule  serves  to  give  a  compact  calculation  for  the  fusion  of  data  from  several 
sensors.  Extend  the  notation  from  the  previous  section,  with  time  as  a  subscript,  by 
adding  a  superscript  to  denote  sensor  number: 


Single  sensor  output  at  indicated  time  step 


sensor  number 
^time  step 


All  data  up  to  and  including  time  step 


Y 


sensor  number 
time  step 


(3.30) 


Fusing  Two  Sensors 

The  following  example  of  fusion  with  some  preprocessing  shows  the  important  points  in 
the  general  process.  Suppose  two  sensors  are  observing  a  target,  whose  signature  ensures 
that  it  s  either  an  F-lll,  an  F/A-18  or  a  P-3C  Orion.  We  will  derive  the  technique  here 
for  the  fusing  of  the  sensors’  preprocessed  data. 

Sensor  1  s  latest  data  set  is  denoted  Y^,  formed  by  the  addition  of  its  current  mea¬ 
surement  t/j  to  its  old  data  set  Vq1 .  Similarly,  sensor  2  adds  its  latest  measurement  j/|  to 
its  old  data  set  Y02 .  The  relevant  measurements  are  in  Table  1.  Of  course  these  are  not 
in  any  sense  raw  data.  Each  sensor  has  made  an  observation,  and  then  preprocessed  it 
to  estimate  what  type  the  aircraft  might  be,  through  the  use  of  tracking  involving  that 
observation  and  those  preceding  it  (as  described  in  the  previous  section). 

As  can  be  seen  from  the  old  data,  Yq1  Yq  ,  both  sensors  are  leaning  towards  identifying 
the  target  as  an  F-lll.  Their  latest  data,  j/J  pf,  makes  them  even  more  sure  of  this.  The 
fusion  node  has  allocated  probabilities  for  the  fused  sensor  pair  as  given  in  the  table,  with 
e.g.  0.5  for  the  F-lll.  These  fused  probabilities  are  what  we  wish  to  calculate  for  the  latest 
data;  the  0.5, 0.4, 0.1  values  listed  in  the  table  might  be  prior  estimates  of  what  the  target 
could  reasonably  be  (if  this  is  our  first  iteration),  or  they  might  be  based  on  a  previous 
iteration  using  old  data.  So  for  example  if  the  plane  is  known  to  be  flying  at  high  speed, 
then  it  probably  is  not  the  Orion,  in  which  case  this  aircraft  should  be  allocated  a  smaller 
prior  probability  than  the  other  two. 

Now  how  does  the  fusion  node  combine  this  information?  With  the  target  labelled  x, 
the  fusion  node  wishes  to  know  the  probability  of  x  being  one  of  the  three  aircraft  types, 
given  the  latest  set  of  data:  (x|Y/  Y2).  This  can  be  expressed  in  terms  of  its  constituents 
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Table  1:  All  data  from  sensors  1  and  2  in  Section  3.2 


Sensor  1  old  data: 

Sensor  2  old  data: 

(x  =  F-lll  |  Fq1)  -  0.4 
(x  =  F/A-18 1  To1)  =  0.4 
(x  =  P-3C  |  Tq1)  =  0.2 

(x  =  F-lll  |  Y02)  =0.6 
(x  =  F/A-18  |  Y02)  =  0.3 
(x  =  P-3C  |  Y02)  =  0.1 

Sensor  1  new  data: 

Sensor  2  new  data: 

(x  =  F-lll\Yf)  ~  0.70 
(x  =  F/A-18\  Tj1)  =  0.29 
(x  =  P-3C\  Yj1)  =  0.01 

(x  =  F-lll  |  Y2)  =0.80 
(x  =  F/A-18  \Y?)  =0.15 
(x  =  P-3C\  Yj2)  =  0.05 

Fusion  node  has: 

(x  =  F-lll  |  Yo1  Y02)  =  0.5 
(x  =  F/A-18  \YjYf)  =0.4 
(x  =  P-3C  |  Yo1  Y02)  =  0.1 

using  Bayes’  rule: 

(*l  Y?Y?)  =  (xlylyfYjYi) 


(yl  y\  I  x,  Yp1  y02)  (x  ]  Yf  y02) 
(2/j1  yl  I  ^o1  Yo) 

The  sensor  measurements  are  assumed  independent,  so  that 

(yl yl  I *,  ^o1  Yo)  =  (yl  I *»  ^o1)  (yl  I yo2) 

In  that  case,  (3.31)  becomes 

/  .ylytt  fail*.*?)  MiflM 
1  1  1  lj~ 


(3.31) 


(3.32) 


(3.33) 


If  we  now  use  Bayes’  rule  to  again  swap  the  data  y  and  target  state  x  in  the  first  two 
terms  of  the  numerator  of  (3.33),  we  obtain  the  final  recipe  for  how  to  fuse  the  data: 


(*l*o)  (x  I  ro2) 

(xlYj1)  (xjY2)  (x  1  Y0l  y02) 
(x\Yj)  (x  |  Y02) 


WYj) 

(vltiWYf) 


x  normalisation 


(3.34) 


19 


DSTO-TR-1436 


The  necessary  quantities  are  listed  in  Table  1,  so  that  (3.34)  gives 


(x  =  F-lll  |  F/  F2)  oc 
(x  =  F/A-18\Y{  Y?)  oc 
{x  =  p-3C\y{y?)  oc 


0.70  x  0.80  x  0.5 
0.4  x  0.6 
0.29  x  0.15  x  0.4 
0.4  x  0.3 
0.01  x  0.05  x  0.1 
0.2  x  0.1 


These  are  easily  normalised,  becoming  finally 


(3.35) 


(x  =  F-lll\Y?Y?)  ~  88.8% 

(x  =  F/A-18\ Fj1  F2)  ~  11.0% 

(x  =  P-3C\ F/  F2)  ~  0.2%  .  (3.36) 

Thus  for  the  chance  that  the  target  is  an  F-lll,  the  two  latest  probabilities  of  70%,  80% 
derived  from  sensor  measurements  have  fused  to  update  the  old  value  of  50%  to  a  new 
value  of  88.8%,  and  so  on  as  summarised  in  Table  2.  These  numbers  reflect  the  strong 
belief  that  the  target  is  highly  likely  to  be  an  F-lll,  less  probably  an  F/A-18,  and  almost 
certainly  not  an  Orion. 


Table  2:  Evolution  of  probabilities  for  the  various  aircraft 


Target  type 

Old  value 

Latest  sensor  probs: 

Sensor  1  Sensor  2 

New  value 

F-lll 

50% 

70% 

80% 

88.8% 

F/A-18 

40% 

29% 

15% 

11.0% 

P-3C 

10% 

1% 

5% 

0.2% 

Three  or  More  Sensors 


The  analysis  that  produced  Equation  (3.34)  is  easily  generalised  for  the  case  of  multiple 
sensors.  The  three  sensor  result  is 

1  1  1  1  l} - (l|y0‘)  (x I y02)  (*|YJ) - *  normal, sot, on  .  (3.3T) 


and  so  on  for  more  sensors.  This  expression  also  shows  that  the  fusion  order  is  irrelevant, 
a  result  that  also  holds  in  Dempster-Shafer  theory.  Without  a  doubt,  this  fact  simplifies 
multiple  sensor  fusion  enormously. 
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4  Dempster-Shafer  Data  Fusion 


The  Bayes  and  Dempster-Shafer  approaches  are  both  based  on  the  concept  of  at¬ 
taching  weightings  to  the  postulated  states  of  the  system  being  measured.  While  Bayes 
applies  a  more  “classical”  meaning  to  these  in  terms  of  well  known  ideas  about  proba¬ 
bility,  Dempster-Shafer  [24,  25]  allows  other  alternative  scenarios  for  the  system,  such  as 
treating  equally  the  sets  of  alternatives  that  have  a  nonzero  intersection:  for  example, 
we  can  combine  all  of  the  alternatives  to  make  a  new  state  corresponding  to  “unknown” . 
But  the  weightings,  which  in  Bayes’  classical  probability  theory  are  probabilities,  are  less 
well  understood  in  Dempster-Shafer  theory.  Dempster- Shafer’s  analogous  quantities  are 
called  masses,  underlining  the  fact  that  they  are  only  more  or  less  to  be  understood  as 
probabilities. 

Dempster-Shafer  theory  assigns  its  masses  to  all  of  the  subsets  of  the  entities  that 
comprise  a  system.  Suppose  for  example  that  the  system  has  5  members.  We  can  label 
them  all,  and  describe  any  particular  subset  by  writing  say  “1”  next  to  each  element  that 
is  in  the  subset,  and  “0”  next  to  each  one  that  isn’t.  In  this  way  it  can  be  seen  that  there 
are  25  subsets  possible.  If  the  original  set  is  called  5  then  the  set  of  all  subsets  (that 
Dempster-Shafer  takes  as  its  start  point)  is  called  2s,  the  power  set. 

A  good  application  of  Dempster-Shafer  theory  is  covered  in  the  work  of  Zou  et  al.  [3] 
discussed  in  Section  2.2  (page  3)  of  this  report.  Their  robot  divides  its  surroundings 
into  a  grid,  assigning  to  each  cell  in  this  grid  a  mass:  a  measure  of  confidence  in  each 
of  the  alternatives  “occupied”,  “empty”  and  “unknown”.  Although  this  mass  is  strictly 
speaking  not  a  probability,  certainly  the  sum  of  the  masses  of  all  of  the  combinations  of  the 
three  alternatives  (forming  the  power  set)  is  required  to  equal  one.  In  this  case,  because 
“unknown”  equals  “occupied  or  empty” ,  these  three  alternatives  (together  with  the  empty 
set,  which  has  mass  zero)  form  the  whole  power  set. 

Dempster-Shafer  theory  gives  a  rule  for  calculating  the  confidence  measure  of  each 
state,  based  on  data  from  both  new  and  old  evidence.  This  rule,  Dempster’s  rule  of 
combination,  can  be  described  for  Zou’s  work  as  follows.  If  the  power  set  of  alternatives 
that  their  robot  builds  is 


{occupied,  empty,  unknown}  which  we  write  as  {O,  E,  U}  ,  (4.1) 


then  we  consider  three  masses:  the  bottom-line  mass  m  that  we  require,  being  the  confi¬ 
dence  in  each  element  of  the  power  set;  the  measure  of  confidence  ms  from  sensors  (which 
must  be  modelled);  and  the  measure  of  confidence  m0  from  old  existing  evidence  (which 
was  the  mass  m  from  the  previous  iteration  of  Dempster’s  rule).  As  discussed  in  the  next 
section,  Dempster’s  rule  of  combination  then  gives,  for  elements  A,  B ,  C  of  the  power  set: 


m(C)  = 


£  ms(A)m0{B ) 

Ar\B=c _ 

1  -  £  ms(A)m0(B) 
APtB=es 


(4.2) 


Apply  this  to  the  robot’s  search  for  occupied  regions  of  the  grid.  Dempster’s  rule  becomes 


m(O)  = 


ms(Q)m0(0 )  +  ms(0)mo(U)  -f  ms(U)m0(Q) 
1  -  ms(0)m0{E)  -  mg(E)m0(0) 


(4.3) 
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While  Zou’s  robot  explores  its  surroundings,  it  calculates  m(0)  for  each  point  of  the 
grid  that  makes  up  its  region  of  mobility,  and  plots  a  point  if  m(O)  is  larger  than  some 
preset  confidence  level.  Hopefully,  the  picture  it  plots  will  be  a  plan  of  the  walls  of  its 
environment. 

In  practice,  as  we  have  already  noted,  Zou  et  al.  did  achieve  good  results,  but  the 
quality  of  these  was  strongly  influenced  by  the  choice  of  parameters  determining  the  sensor 
masses  ma. 


4.1  Fusing  Two  Sensors 

As  a  more  extensive  example  of  applying  Dempster- Shafer  theory,  focus  again  on  the 
aircraft  problem  considered  in  Section  3.2.  We  will  allow  two  extra  states  of  our  knowledge: 

1.  The  “unknown”  state,  where  a  decision  as  to  what  the  aircraft  is  does  not  appear  to 
be  possible  at  all.  This  is  equivalent  to  the  subset  {F-lll, F/A-18,  P-3C}. 

2.  The  “fast”  state,  where  we  cannot  distinguish  between  an  F-lll  and  an  F/A-18. 
This  is  equivalent  to  {F-lll,  F/A-18 }. 


Suppose  then  that  two  sensors  allocate  masses  to  the  power  set  as  in  Table  3;  the  third 
column  holds  the  final  fused  masses  that  we  are  about  to  calculate.  Of  the  eight  subsets 
that  can  be  formed  from  the  three  aircraft,  only  five  are  actually  useful,  so  these  are  the 
only  ones  allocated  any  mass.  Dempster-Shafer  also  requires  that  the  masses  sum  to  one 


Table  3:  Mass  assignments  for  the  various  aircraft 


Target  type 

Sensor  1 
(mass  m1) 

Sensor  2 
(mass  m2) 

Fused  masses 
(mass  m1'2) 

F-lll 

30% 

40% 

55% 

F/A-18 

15% 

10% 

16% 

P-3C 

3% 

2% 

0.4% 

Fast 

42% 

45% 

29% 

Unknown 

10% 

3% 

0.3% 

Total  mass 

100% 

100% 

100% 

(correcting  for 

rounding  errors) 

over  the  whole  power  set.  Remember  that  the  masses  are  not  quite  probabilities:  for 
example  if  the  sensor  1  probability  that  the  target  is  an  F-lll  was  really  just  another 
word  for  its  mass  of  30%,  then  the  extra  probabilities  given  to  the  F-lll  through  the  sets 
of  fast  and  unknown  targets  would  not  make  any  sense. 

These  masses  are  now  fused  using  Dempster’s  rule  of  combination.  This  rule  can  in 
the  first  instance  be  written  quite  simply  as  a  proportionality,  using  the  notation  defined 


22 


DSTO-TR-1 436 


in  Equation  (3.30)  to  denote  sensor  number  as  a  superscript: 

m1,2(C)  oc  m1(A)m2(B)  .  (4.4) 

AnB=c 

We  will  combine  the  data  of  Table  3  using  this  rule.  For  example  the  F-lll: 

m1,2(F-lll)  oc  m1(F-lll)  m2(F-lll)  +  m1{F-lll )  m2(Fast )  Fm1  (F-lll)  m2(Unknown) 
+  m1(Fast)  m2(F-lll)  +  m1  (Unknown)  m2(F-lll) 

=  0.30  x  0.40  +  0.30  x  0.45  +  0.30  x  0.03  +  0.42  x  0.40  +  0.10  x  0.40 
=  0.47  (4.5) 

The  other  relative  masses  are  found  similarly.  Normalising  them  by  dividing  each  by  their 
sum  yields  the  final  mass  values:  the  third  column  of  Table  3.  The  fusion  reinforces  the 
idea  that  the  target  is  an  F-lll  and,  together  with  our  initial  confidence  in  its  being  a 
fast  aircraft,  means  that  we  are  more  sure  than  ever  that  it  is  not  a  P-3C.  Interestingly 
though,  despite  the  fact  that  most  of  the  mass  is  assigned  to  the  two  fast  aircraft,  the 
amount  of  mass  assigned  to  the  “fast”  type  is  not  as  high  as  we  might  expect.  Again,  this 
is  a  good  reason  not  to  interpret  Dempster-Shafer  masses  as  probabilities. 


We  can  highlight  this  apparent  anomaly  further  by  reworking  the  example  with  a  new 
set  of  masses,  as  shown  in  Table  4.  The  second  sensor  now  assigns  no  mass  at  all  to  the 


Table  4:  A  new  set  of  mass  assignments,  to  highlight  the  “fast”  subset 
anomaly  in  Table  3 


Target  type 

Sensor  1 
(mass  m1) 

Sensor  2 
(mass  m2) 

Fused  masses 
(mass  m1,2) 

F-lll 

30% 

50% 

63% 

F/A-18 

15% 

30% 

31% 

P-3C 

3% 

17% 

3.5% 

Fast 

42% 

2% 

Unknown 

10% 

3% 

0.5% 

Total  mass 

100% 

100% 

100% 

“fast”  type.  We  might  interpret  this  to  mean  that  it  has  no  opinion  on  whether  the  aircraft 
is  fast  or  not.  But,  such  a  state  of  affairs  is  no  different  numerically  from  assigning  a  zero 
mass:  as  if  the  second  sensor  has  a  strong  belief  that  the  aircraft  is  not  fast!  As  before, 
fusing  the  masses  of  the  first  two  columns  of  Table  4  produces  the  third  column.  Although 
the  fused  masses  still  lead  to  the  same  belief  as  previously,  the  2%  value  for  m1,2(Fast) 
is  clearly  at  odds  with  the  conclusion  that  the  target  is  very  probably  either  an  F-lll 
or  an  F/A-18.  So  masses  certainly  are  not  probabilities.  It  might  well  be  that  a  lack  of 
knowledge  of  a  state  means  that  we  should  assign  to  it  a  mass  higher  than  zero,  but  just 
what  that  mass  should  be,  considering  the  possibly  high  total  number  of  subsets,  is  open 
to  interpretation.  However,  as  we  shall  see  in  the  next  section,  the  new  notions  of  support 
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and  plausibility  introduced  by  Dempster-Shafer  theory  go  far  to  rescue  this  paradoxical 
situation. 

Consider  now  a  new  situation.  Suppose  sensor  1  measures  frequency,  sensor  2  measures 
range  rate  and  cross  section,  and  the  target  is  actually  a  decoy:  a  slow-flying  unmanned 
airborne  vehicle  with  frequency  and  cross  section  typical  of  a  fighter  aircraft,  but  having  a 
very  slow  speed.  Suggested  new  masses  are  given  in  Table  5.  Sensor  1  allocates  masses  as 
before,  but  sensor  2  detects  what  appears  to  be  a  fighter  with  a  very  slow  speed.  Hence  it 
spreads  its  mass  allocation  evenly  across  the  three  aircraft,  while  giving  no  mass  at  all  to 
the  Fast  set.  Like  sensor  1,  it  gives  a  10%  mass  to  the  “Unknown”  set.  As  can  be  seen, 
the  fused  masses  only  strengthen  the  idea  that  the  target  is  a  fighter. 

[In  passing,  note  that  a  slow”  set  cannot  simply  be  introduced  from  the  outset  with 
only  one  member  (the  P-3C),  because  this  set  already  exists  as  the  “P-3C”  set.  After  all, 
Dempster-Shafer  deals  with  all  subsets  of  the  superset  of  possible  platforms,  and  there  can 
only  be  one  such  set  containing  just  the  P-3C.] 


Table  5:  Allocating  masses  when  the  target  is  a  decoy,  but  with  no  “Decoy” 
state  specified 


Target  type 

Sensor  1 
(mass  m1) 

Sensor  2 
(mass  m2) 

Fused  masses 
(mass  m1’2) 

F-lll 

30% 

30% 

47% 

F/A-18 

15% 

30% 

37% 

P-3C 

3% 

30% 

7% 

Fast 

42% 

0% 

7% 

Unknown 

10% 

10% 

2% 

Total  mass 

100% 

100% 

100% 

Since  sensor  1  measures  only  frequency,  it  will  allocate  most  of  the  mass  to  fighters, 
perhaps  not  ruling  the  P-3C  out  entirely.  On  the  other  hand,  suppose  that  sensor  2  has 
enough  preprocessing  to  realise  that  something  is  amiss;  it  seems  to  be  detecting  a  very 
slow  fighter.  Because  of  this  it  decides  to  distribute  some  mass  evenly  over  the  three 
platforms,  but  allocates  most  of  the  mass  to  the  “Unknown”  set,  as  in  Table  6.  Again  it 
can  be  seen  that  sensor  l’s  measurements  are  still  pushing  the  fusion  towards  a  fighter 
aircraft. 

Because  the  sensors  are  yielding  conflicting  data  with  no  resolution  in  sight,  it  appears 
that  we  will  have  to  introduce  the  decoy  as  an  alternative  platform  that  accounts  for  the 
discrepancy.  (Another  idea  is  to  use  the  disfusion  idea  put  forward  by  Myler  [4],  but  this 
has  not  been  pursued  in  this  report.)  Consider  then  the  new  masses  in  Table  7.  Sensor  1  is 
now  also  open  to  the  possibility  that  a  decoy  might  be  present.  The  fused  masses  now  show 
that  the  decoy  is  considered  highly  likely — but  only  because  sensor  2  allocated  so  much 
mass  to  it.  (If  sensor  2  alters  its  Decoy/Unknown  masses  from  60/10  to  40/30%,  then  the 
fused  decoy  mass  is  reduced  from  50  to  33%,  while  the  other  masses  only  change  by  smaller 
amounts.)  It  is  apparent  that  the  assignment  of  masses  to  the  states  is  not  a  trivial  task, 
and  we  certainly  will  not  benefit  if  we  lack  a  good  choice  of  target  possibilities.  This  last 
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Table  6:  Allocating  masses  when  the  target  is  a  decoy,  still  with  no  “Decoy” 
state  specified;  but  now  sensor  2  realises  there  is  a  problem  in  its  measure¬ 
ments 


Target  type 

Sensor  1 
(mass  m1) 

Sensor  2 
(mass  m2) 

Fused  masses 
(mass  m1,2) 

F-lll 

30% 

10% 

34% 

F/A-18 

15% 

10% 

20% 

P-3C 

3% 

10% 

4% 

Fast 

42% 

0% 

34% 

Unknown 

10% 

70% 

8% 

Total  mass 

100% 

100% 

100% 

Table  7:  Now  introducing  a  “Decoy”  state 


Target  type 

Sensor  1 
(mass  m1) 

Sensor  2 
(mass  m2) 

Fused  masses 
(mass  m1,2) 

F-lll 

30% 

10% 

23% 

F/A-18 

15% 

10% 

15% 

P-3C 

3% 

10% 

4% 

Fast 

22% 

0% 

6% 

Decoy 

20% 

60% 

50% 

Unknown 

10% 

10% 

2% 

Total  mass 

100% 

100% 

100% 

problem  is,  however,  a  generic  fusion  problem,  and  not  an  indication  of  any  shortcoming 
of  Dempster-Shafer  theory. 


Normalising  Dempster’s  rule  Because  of  the  seeming  lack  of  significance  given  to 
the  “fast”  state,  perhaps  we  should  have  no  intrinsic  interest  in  calculating  its  mass.  In 
fact,  knowledge  of  this  mass  is  actually  not  required  for  the  final  normalisation,2  so  that 
Dempster’s  rule  is  usually  written  as  an  equality: 


£  m1(A)m2{B)  £  ml{A)m2{B) 


m1,2(C) 


ADB=C 


AnB=c 


£  m1{A)m2{B)  l-£  m1{A)m2(B)  ' 

AnBjtei  Ans=0 


(4.6) 


2The  normalisation  arises  in  the  following  way.  Because  the  sum  of  the  masses  of  each  sensor  is  required 
to  be  one,  it  must  be  true  that  the  sum  of  all  products  of  masses  (one  from  each  sensor)  must  also  be  one. 
But  these  products  are  just  all  the  possible  numbers  that  appear  in  Dempster’s  rule  of  combination  (4.4). 
So  this  sum  can  be  split  into  two  parts:  terms  where  the  sets  involved  have  a  nonempty  intersection  and 
thus  appear  somewhere  in  the  calculation,  and  terms  where  the  sets  involved  have  an  empty  intersection 
and  so  don’t  appear.  To  normalise,  we’ll  ultimately  be  dividing  each  relative  mass  by  the  sum  of  all 
products  that  do  appear  in  Dempster’s  rule,  or — perhaps  the  easier  number  to  evaluate — one  minus  the 
sum  of  all  products  that  don’t  appear. 
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Dempster-Shafer  in  Tracking  A  comparison  of  Dempster-Shafer  fusion  in  Equa¬ 
tion  (4.6)  and  Bayes  fusion  in  (3.34),  shows  that  there  is  no  time  evolution  in  (4.6). 
But  we  can  allow  for  it  after  the  sensors  have  been  fused,  by  a  further  application  of 
Dempster’s  rule,  where  the  sets  A,  B  in  (4.6)  now  refer  to  new  and  old  data.  Zou’s  robot 
is  an  example  of  this  sort  of  fusion  from  the  literature,  as  discussed  in  Sections  2.2  and  4 
of  this  report  (pages  3  and  21). 


4.2  Three  or  More  Sensors 


In  the  case  of  three  or  more  sensors,  Dempster’s  rule  might  in  principle  be  applied 
in  different  ways  depending  on  which  order  is  chosen  for  the  sensors.  But  it  turns  out 
that  because  the  rule  is  only  concerned  with  set  intersections,  the  fusion  order  becomes 
irrelevant.  Thus  three  sensors  fuse  to  give 


£  ml{A)  m?(B)  m3(C)  £  ml{A)  m2(B)  m3(C) 

m1’2,3(D)  —  AnBrc=D  _ Ar\Br\C=D  _ 

£  m1{A)m2(B)m3(C)  l-£  m}{A)  m2(B)  mz[C) 

AnBnc^z  AnBnc=iz 


.  (4-7) 


and  higher  numbers  are  dealt  with  similarly. 


4.3  Support  and  Plausibility 

Dempster-Shafer  theory  contains  two  new  ideas  that  are  foreign  to  Bayes  theory.  These 
are  the  notions  of  support  and  plausibility.  For  example,  the  support  for  the  target  being 
“fast”  is  defined  to  be  the  total  mass  of  all  states  implying  the  “fast”  state.  Thus 

spt(A)  =  JT  m(B)  .  (4.8) 

BCA 

The  support  is  a  kind  of  loose  lower  limit  to  the  uncertainty.  On  the  other  hand,  a  loose 
upper  limit  to  the  uncertainty  is  the  plausibility.  This  is  defined,  for  the  “fast”  state,  as 
the  total  mass  of  all  states  that  don’t  contradict  the  “fast”  state.  In  other  words: 

pls(A)  =  m(B)  .  (4.9) 

AnB/0 

The  supports  and  plausibilities  for  the  masses  of  Table  3  are  given  in  Table  8.  Interpreting 
the  probability  of  the  state  as  lying  roughly  somewhere  between  the  support  and  the  plau¬ 
sibility  gives  the  following  results  for  what  the  target  might  be,  based  on  the  fused  data: 
there  is  a  good  possibility  of  its  being  an  F-lll;  a  reasonable  chance  of  an  F/A-18,  and 
almost  no  chance  of  its  being  a  P-3C;  this  last  goes  hand  in  hand  with  the  virtual  cer¬ 
tainty  that  the  target  is  fast.  Finally,  the  last  implied  probability  might  look  nonsensical: 
it  might  appear  to  suggest  that  there  is  a  100%  lack  of  knowledge  of  what  the  target  is, 
despite  all  that  has  just  been  said.  But  this  is  not  what  the  masses  imply  at  all.  What 
they  do  imply  is  that  there  is  complete  certainty  that  the  target  is  unknown.  And  that 
is  quite  true:  the  target  identity  is  unknown.  But  what  is  also  implied  by  the  100%  is 
that  there  is  complete  certainty  that  the  target  is  one  of  the  elements  in  the  superset  of 
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Table  8:  Supports  and  plausibilities  associated  with  Table  3 


Target  type 

Sensor  1 

Sensor  2 

Fused  masses 

Spt 

Pis 

Spt 

Pis 

Spt 

Pis 

F-lll 

30% 

82% 

40% 

88% 

55% 

84% 

F/A-18 

15% 

67% 

10% 

58% 

16% 

45% 

P-3C 

3% 

13% 

2% 

5% 

0.4% 

1% 

Fast 

87% 

97% 

95% 

98% 

99% 

~100% 

Unknown 

100% 

100% 

100% 

100% 

100% 

100% 

platforms,  even  if  we  cannot  be  sure  which  one.  So  in  this  sense  it  is  important  to  populate 
the  Unknown  set  with  all  possible  platforms.  We  have  used  such  set  intersections  as 

{F- 1 1 1}H  Unknown  =  {F- 111}  ,  (4.10) 

because  Dempster-Shafer  theory  treats  the  Unknown  set  as  a  superset.  But  this  vague¬ 
ness  of  just  what  is  meant  by  an  “Unknown”  state  can  and  does  give  rise  to  apparent 
contradictions  in  Dempster-Shafer  theory. 

A  major  feature  of  the  ideas  of  support  and  plausibility  is  that  when  they  are  calculated 
for  the  seemingly  anomalous  masses  of  Table  4,  the  results  are  far  more  in  accordance  with 
our  expectations  about  how  prominently  the  “fast”  state  should  appear — provided  that  the 
target  really  is  not  a  decoy.  These  new  supports  and  plausibilities  are  shown  in  Table  9. 
The  “fast”  state,  which  was  allocated  no  mass  by  sensor  2 — perhaps  on  account  of  no 

Table  9:  Supports  and  plausibilities  associated  with  the  seemingly  anomalous 
Table  4 


Target  type 

Sensor  1 

Sensor  2 

Fused  masses 

Spt 

Pis 

Spt 

Pis 

Spt 

Pis 

F-lll 

30% 

82% 

50% 

53% 

63% 

65.5% 

F/A-18 

15% 

67% 

30% 

33% 

31% 

33.5% 

P-3C 

3% 

13% 

17% 

20% 

3.5% 

4% 

Fast 

87% 

97% 

80% 

83% 

96% 

96.5% 

Unknown 

100% 

100% 

100% 

100% 

100% 

100% 

specific  information  being  available  to  this  sensor  on  which  to  base  a  mass  estimate— now 
has  supports  and  plausibilities  that  accord  with  the  conclusion  that  the  target  is  probably 
an  F-lll.  The  reason  for  this  is  that  the  support  for  the  “fast”  state  is  a  sum  of  masses 
that  correspond  to  sets  having  only  fast  targets,  while  the  plausibility  of  the  “fast”  state 
is  a  (bigger)  sum  of  masses  that  correspond  to  sets  having  at  least  one  fast  target.  So 
the  supports  and  plausibilities  of  the  “fast”  state  now  inherit  some  of  the  weight  given  to 
the  F-lll  and  F/A-18  aircraft,  which  accords  with  our  intuition.  It  appears  then,  that 
calculating  support  and  plausibility  can  be  of  more  value  than  simply  calculating  fused 
masses. 
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5  Comparing  Dempster-Shafer  and  Bayes 

The  major  difference  between  these  two  theories  is  that  Bayes  works  with  probabilities, 
which  is  to  say  rigorously-defined  numbers  that  reflect  how  often  an  event  will  happen  if 
an  experiment  is  performed  a  large  number  of  times.  On  the  other  hand,  Dempster- 
Shafer  theory  considers  a  space  of  elements  that  each  reflect  not  what  Nature  chooses, 
but  rather  the  state  of  our  knowledge  after  making  a  measurement.  Thus,  Bayes  does 
not  use  a  specific  state  called  “unknown  emitter  type”— although  after  applying  Bayes 
theory,  we  might  well  have  no  clear  winner,  and  will  decide  that  the  state  of  the  target  is 
best  described  as  unknown.  On  the  other  hand,  Dempster-Shafer  certainly  does  require 
us  to  include  this  “unknown  emitter  type”  state,  because  that  can  well  be  the  state  of  our 
knowledge  at  any  time.  Of  course  the  plausibilities  and  supports  that  Dempster-Shafer 
generates  also  may  or  may  not  give  a  clear  winner  for  what  the  state  of  the  target  is,  but 
this  again  is  distinct  from  the  introduction  into  that  theory  of  the  “unknown  emitter  type” 
state,  which  is  always  done. 

The  fact  that  we  tend  to  think  of  Dempster-Shafer  masses  somewhat  nebulously  as 
probabilities  suggests  that  we  should  perhaps  use  real  probabilities  when  we  can,  but 
Dempster-Shafer  theory  doesn’t  demand  this. 

Both  theories  have  a  certain  initial  requirement.  Dempster-Shafer  theory  requires 
masses  to  be  assigned  in  a  meaningful  way  to  the  various  alternatives,  including  the  “un¬ 
known”  state;  whereas  Bayes  theory  requires  prior  probabilities— although  at  least  for 
Bayes,  the  alternatives  to  which  they’re  applied  are  all  well  defined.  One  advantage  of 
using  one  approach  over  the  other  is  the  extent  to  which  prior  information  is  available. 
Although  Dempster-Shafer  theory  doesn’t  need  prior  probabilities  to  function,  it  does  re¬ 
quire  some  preliminary  assignment  of  masses  that  reflects  our  initial  knowledge  of  the 
system. 

Unlike  Bayes  theory,  Dempster-Shafer  theory  explicitly  allows  for  an  undecided  state 
of  our  knowledge.  It  can  of  course  sometimes  be  far  safer  to  be  undecided  about  what 
a  target  is,  than  to  decide  wrongly  and  act  accordingly  with  what  might  be  disastrous 
consequences.  But  as  the  example  of  Table  4  shows,  fused  masses  relating  to  target  sets 
that  contain  more  than  one  element  can  sometimes  be  ambiguous.  Nevertheless,  Dempster- 
Shafer  theory  attempts  to  fix  this  paradox  by  introducing  the  notions  of  support  and 
plausibility. 

These  notions  of  support  and  plausibility,  dealing  with  the  state  of  our  knowledge, 
contrast  with  the  Bayes  approach  which  concerns  itself  with  classical  probability  theory 
only.  On  the  other  hand,  while  Bayes  theory  might  be  thought  of  as  more  antiquated  in 
that  sense,  the  pedigree  of  probability  theory  gives  it  an  edge  over  Dempster-Shafer  in 
terms  of  being  better  understood  and  accepted. 

Dempster-Shafer  calculations  tend  to  be  longer  and  more  involved  than  their  Bayes 
analogues  (which  are  not  required  to  work  with  all  the  elements  of  a  set);  and  despite  the 
fact  that  reports  such  as  [6]  and  [26]  indicate  that  Dempster-Shafer  can  sometimes  perform 
better  than  Bayes  theory,  Dempster-Shafer’s  computational  disadvantages  do  nothing  to 
increase  its  popularity. 

Braun  [26]  has  performed  a  Monte  Carlo  comparison  between  the  Dempster-Shafer 
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and  Bayes  approaches  to  data  fusion.  The  paper  begins  with  a  short  overview  of  Dempster- 
Shafer  theory.  It  simply  but  clearly  defines  the  Dempster- Shafer  power  set  approach,  along 
with  the  probability  structure  built  upon  this  set:  basic  probability  assignments,  belief- 
and  plausibility  functions.  It  follows  this  with  a  simple  but  very  clear  example  of  Dempster- 
Shafer  formalism  by  applying  the  central  rule  of  the  theory,  the  Dempster  combination 
rule,  to  a  set  of  data. 

What  is  not  at  all  clear  is  precisely  which  sort  of  algorithm  Braun  is  implementing  to 
run  the  Monte  Carlo  simulations,  and  how  the  data  is  generated.  He  considers  a  set  of 
sensors  observing  objects.  These  objects  can  belong  to  any  one  of  a  number  of  classes, 
with  the  job  of  the  sensors  being  to  decide  to  which  class  each  object  belongs.  Specific 
numbers  are  not  mentioned,  although  Braun  does  plot  the  number  of  correct  assignments 
versus  the  total  number  of  fusion  events  for  zero  to  2500  events. 

The  results  of  the  simulations  show  fairly  linear  plots  for  both  the  Dempster-Shafer  and 
Bayes  approaches.  The  Bayes  approach  rises  to  a  maximum  of  1700  successes  in  the  2500 
fusion  instances,  while  the  Dempster-Shafer  mode  attains  a  maximum  of  2100  successes — 
which  would  seem  to  place  it  as  the  more  successful  theory,  although  the  author  does  not 
say  as  much  directly.  He  does  produce  somewhat  obscure  plots  showing  finer  details  of 
the  Bayes  and  Dempster-Shafer  successes  as  functions  of  the  degree  of  confidence  in  the 
various  hypotheses  that  make  up  his  system.  What  these  show  is  that  both  methods  are 
robust  over  the  entire  sensor  information  domain,  and  generally  where  one  succeeds  or 
fails  the  other  will  do  the  same,  with  just  a  slight  edge  being  given  to  Dempster-Shafer  as 
compared  with  the  Bayes  approach. 


6  Concluding  Remarks 

Although  data  fusion  still  seems  to  take  tracking  as  its  prototype,  fusion  applications 
are  beginning  to  be  produced  in  numerous  other  areas.  Not  all  of  these  uses  have  a 
statistical  basis  however;  often  the  focus  is  just  on  how  to  fuse  data  in  whichever  way,  with 
the  question  of  whether  that  fusion  is  the  best  in  some  sense  not  always  being  addressed. 
Nor  can  it  always  be,  since  very  often  the  calculations  involved  might  be  prohibitively  many 
and  complex.  Currently  too,  there  is  still  a  good  deal  of  philosophising  about  pertinent 
data  fusion  issues,  and  the  lack  of  hard  rules  to  back  this  up  is  partly  due  to  the  difficulty 
in  finding  common  ground  for  the  many  applications  to  which  fusion  is  now  being  applied. 
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Appendix  A  Gaussian  Distribution  Theorems 

The  following  theorems  are  special  cases  of  the  one-dimensional  results  that  the  product 
of  Gaussians  is  another  Gaussian,  and  the  integral  of  a  Gaussian  is  also  another  Gaussian. 

The  notation  is  as  follows.  Just  as  a  Gaussian  distribution  in  one  dimension  is  written 
in  terms  of  its  mean  fi  and  variance  cr2  as 


N{: x\  n,  a2)  =  — -=  exp 
cry  27T 


-{x-nf 


so  also  a  Gaussian  distribution  in  an  n-dimensional  vector  x  is  denoted  through  its  mean 
vector  fj,  and  covariance  matrix  P  in  the  following  way: 

N(x\  n,  P )  =  |p|i/2^27r^/2  exp  -j-(x  -  /r)TP~1(x  -  p)  ~  N(x  -  fj,;  0,  P)  .  (A2) 


Theorem  1 


N(xi;  /ii,  Pi)  N(x2]Hx1,P2) 
N(x2;Hm,P3) 


=  N(xi\ n,  P)  , 


K  =  P1HT{HPlHT  +  P2)_1 
/x  =  /ii  +K(x2-Hfj,1) 

P  =  (1  —  KH)P\  .  (A4) 

The  method  of  proving  the  above  theorem  is  relatively  well  known,  being  first  shown  in 
[22]  and  later  appearing  in  a  number  of  texts.  However,  the  proof  of  the  next  theorem 
which  deals  with  the  Chapman-Kolmogorov  Theorem  is  not  that  well  known. 


Theorem  2 


/OO 

dx i  N(xi;m,Pi)  N(x2-,Fx1,P2)  =  N(x2;/j,,P)  , 

-OO 


/i  =  Ffii 

P  =  FPXFT  +  P2  .  (A6) 

Here,  we  present  a  proof  of  the  above  theorem  by  directly  solving  the  integral.  Note 
that  in  Gaussian  integrals,  Pi  and  P2  are  symmetric,  which  means  their  inverses  will  be 
too — a  fact  that  we  will  use  often  in  our  derivation. 

The  left  hand  side  of  Equation  (A5)  is 

f°°  1 

J^dx, N(xXPxuK)  =  * 

j exp  — i  [(xi  -  pi)T  p,-1(xi  -  111)  +  (X2  -  fxi )rP2-I(x2  -  Fx,)]  dxi  .  (A7) 
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Write  the  integrand  on  the  right  hand  side  as  e-£/2,  so  that 

E  =  (*1  -  Pi)TP{\xx  -  Mi)  +  (®2  -  Fxx)tP-\x2  -  Fx i)  .  (A8) 

If  we  define 

^  =  x2-Ffi  i  , 

5  =  xi  ~  Mi  ,  (A9) 

then  it  follows  that  x2  -  Fx  i  =  A  -  FB,  in  which  case 

E  =  BTP-1B  +  (A-  FBfP~\A  -  FB) 

-  BtP^xB  +  AtP2-!A  -  BTFTp-1A  -  AtP~1FB  +  BTFTP~lFB  . 

(A10) 

Group  the  first  and  last  terms  to  write 

E  =  Bt  (Pf1  +  FTP2lF)  B  +  ATP2lA  -  BTFTP-1A  -  ATP2lFB  .  (All) 

It  will  be  convenient  to  introduce  two  new  matrices: 

M-1  =  P~x  +  FTP21F  , 

P  =  P2  +  FPxFt  .  (A12) 

Note  that  because  Px  and  P2  are  symmetric,  so  will  M  and  M_1  also  be,  which  we  make 
use  of  frequently.  The  first  term  in  (All)  then  becomes 

E  =  BtM~1B  +  ATP21A  -  BTFTP21A  -  ATP21FB  .  (A13) 

We  can  simplify  E  by  first  inverting  P.  The  very  useful  matrix  inversion  lemma3  gives 

P-1  =  (p2  +  FP^y1  =  P-1  -  P21FMFTP21  ,  (A14) 

which  rearranges  trivially  to  give 

p-1  =  p-1  +  P~1FMFtP21  .  (A15) 

We  now  insert  this  last  expression  into  the  second  term  of  (A13),  giving 

E  =  BTM~lB  +  AtP~1A  +  AtP21FMFtP21A  -  BtFtP~1A  -  AtP21FB 
=  (B  ~  MFtP21A)TM~1(B  -  MFtP21A)  +  AtP~'A  .  (A16) 

Defining  M2  =  Mi  +  MFTP21A  produces  B  -  MFTP21A  =  Xl  -  M2,  in  which  case 

_  E  =  {x  1  -  M2)T M~l  (xi  -  M2)  +  ATP~1A  .  (A17) 

This  says  that  for  matrices  a,  b,  c,  d  of  appropriate  size  and  invertibility: 

(a  +  bcd)~l  =  a-1  -  a-'b  (c~l  +  da~'b)~l  da-1 
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Hence  the  right  hand  side  of  Equation  (A7)  becomes 


e=±ATp-'A 


(27r)W2]p1|1/2(2^)n/2|p2|l/2 


J  exp  — ^  [(zi  -  /i2)T  M  1  (xi  -  ,2)]  dx  1  .  (A18) 


This  is  a  great  improvement  over  (A7),  because  the  integration  variable  x\  only  appears  in 
a  simple  Gaussian  integral,  and  so  can  be  integrated  out.  But  before  doing  that  integration, 
we  will  show  that  the  normalisation  factors  can  be  simplified,  by  means  of  the  following 
fact: 

|P1P2|  =  |PM|  (A19) 

To  prove  this  fact,  we  first  begin  to  rewrite  P  in  terms  of  M,  Pi  and  P2: 

P  =  FP1Ft  +  P2 

=  (fPxFtP^1  +  l)  P2  .  (A20) 


It  will  prove  useful  to  factor  out  F,  but  unfortunately  because  F  is  in  general  not  square 
and  so  not  invertible,  we  cannot  just  introduce  factors  of  F_1  to  effect  this.  However, 
it’s  quite  sufficient  to  make  use  of  a  “right  inverse”,  through  introducing  a  factor  of 
FFt(FFt)~1,  since  this  is  always  well  defined.  In  that  case 


p  =  {fpxftp^1  +  l)  FFt  (ff1)-1  P2 
=  (fp1ftp2“1f  +  f)  Ft  (fft)-1  P2 
=  FPi  (ftP2“1F  +  Pf1)  Ft  (FF1')-1  P2 
-  FP1M-1Ft(FFt)-1P2  .  (A21) 

If  we  now  multiply  both  sides  by  M  and  then  take  the  determinant  of  each,  we  obtain 

\PM\  =  \FPlM~1FT  (FFt)~1  P2M\ 

=  |F||P1!|Mr1|FT||FFTr1|P2||M| 

=  |PiP2|  (A22) 


since  the  various  determinants  cancel.  QED.  This  fact  then  enables  the  Gaussian  integral 
over  xi  in  Equation  (A18)  to  be  easily  set  equal  to  1;  and  so  we  arrive  at  a  simple  expression 
for  Equation  (A7): 

,oo  e^ATP~1A 

j_jxl  ^(xi;/Zl,Pl)  N(x2]FxuP2)  -  /2|p|i/2 

=  (2j)»/2  |j»|ij»  ^  T  [(I2  -  F^Tp~'  (*>  -  F")] 

=  iV(z2;Fpi,P)  .  (A23) 


This  completes  the  proof. 
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